 Coalescent theory

Contents
In genetics, coalescent theory is a retrospective model of population genetics. It attempts to trace all alleles of a gene shared by all members of a population to a single ancestral copy, known as the most recent common ancestor (MRCA; sometimes also termed the coancestor to emphasize the coalescent relationship^{[1]}). The inheritance relationships between alleles are typically represented as a gene genealogy, similar in form to a phylogenetic tree. This gene genealogy is also known as the coalescent; understanding the statistical properties of the coalescent under different assumptions forms the basis of coalescent theory.
The coalescent runs models of genetic drift backward in time to investigate the genealogy of antecedents.^{[2]} In the most simple case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis. The mathematical theory of the coalescent was originally developed in the early 1980s by John Kingman^{[3]}.
Theory
Consider two distinct haploid organisms who differ at a single nucleotide. By tracing the ancestry of these two individuals backwards there will be a point in time when the MRCA is encountered and the two lineages will have coalesced.
Time to coalescence
A useful analysis based on coalescence theory seeks to predict the amount of time elapsed between the introduction of a mutation and the arising of a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed.
The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parent. In a diploid population with a constant effective population size with 2N_{e} copies of each locus, there are 2N_{e} "potential parents" in the previous generation, so the probability that two alleles share a parent is 1/(2N_{e}) and correspondingly, the probability that they do not coalesce is 1 − 1/(2N_{e}).
At each successive preceding generation, the probability of coalescence is geometrically distributed — that is, it is the probability of noncoalescence at the t − 1 preceding generations multiplied by the probability of coalescence at the generation of interest:
For sufficiently large values of N_{e}, this distribution is well approximated by the continuously defined exponential distribution
The standard exponential distribution has both the expected value and the standard deviation equal to 2N_{e}; therefore, although the expected time to coalescence is 2N_{e}, actual coalescence times have a wide range of variation. Note that coalescent time is the number of preceding generations where the coalescence took place and not calendar time though an estimation of the latter can be made multiplying 2N_{e} with the average time between generations.
Neutral variation
Coalescent theory can also be used to model the amount of variation in DNA sequences expected from genetic drift alone. This value is termed the mean heterozygosity, represented as . Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages: 2μ. Thus the mean heterozygosity is equal to
For , the vast majority of allele pairs have at least one difference in nucleotide sequence.
Graphical representation
Coalescents can be visualised using dendrograms which show the relationship of branches of the population to each other. The point where two branches meet indicates a coalescent event.
Applications
Disease gene mapping
The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation; although the application of the theory is still in its infancy, there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory^{[4]}^{[5]}^{[6]}.
History
Coalescent theory is a natural extension of the more classical population genetics concept of neutral evolution and is an approximation to the FisherWright (or WrightFisher) model for large populations. It was ‘discovered’ independently by several researchers in the 1980’s ^{[7]}^{[8]}^{[9]}^{[10]}, but the definitive formalisation is attributed to Kingman ^{[11]}. Major contributions to the development of coalescent theory have been made by Peter Donnelly^{[12]}, Robert Griffiths, Richard R Hudson^{[13]} and Simon Tavaré^{[14]}. This has included incorporating variations in population size^{[15]}, recombination and selection^{[16]}^{[17]}. In 1999 Jim Pitman^{[18]} and Serik Sagitov^{[19]} independently introduced coalescent processes with multiple collisions of ancestral lineages. Shortly later the full class of exchangeable coalescent processes with simultaneous multiple mergers of ancestral lineages was discovered by Martin Möhle and Serik Sagitov^{[20]} and Jason Schweinsberg^{[21]}.
Software
A large body of software exists for both simulating data sets under the coalescent process as well as inferring parameters such as population size and migration rates from genetic data.
 TreesimJ Forward simulation software allowing sampling of genealogies and data sets under diverse selective and demographic models.
 BEAST  Bayesian MCMC inference package with a wide range of coalescent models including the use of temporally sampled sequences.
 CoaSim  software for simulating genetic data under the coalescent model.
 GeneRecon  software for the finescale mapping of linkage disequilibrium mapping of disease genes using coalescent theory based on an Bayesian MCMC framework.
 genetree software for estimation of population genetics parameters using coalescent theory and simulation (the R package popgen). See also Oxford Mathematical Genetics and Bioinformatics Group
 GENOME  rapid coalescentbased wholegenome simulation^{[22]}
 Migrate  Maximum likelihood and Bayesian inference of migration rates under the ncoalescent. The inference is implemented using MCMC
 Migraine  A program which implements coalescent algorithms for a maximum likelihood analysis (using Importance Sampling algortihms) of genetic data with a focus on spatially structured populations ^{[23]}.
 Lamarc  software for estimation of rates of population growth, migration, and recombination.
 MS & MShot  Richard Hudson's original program for generating samples under neutral models ^{[24]} and an extension which allows recombination hotspots^{[25]}.
 SARG  Structure Ancestral Recombination Graph by Magnus Nordborg
 simcoal2 software to simulate genetic data under the coalescent model with complex demography and recombination
 Recodon and NetRecodon software to simulate coding sequences with inter/intracodon recombination, migration, growth rate and longitudinal sampling ^{[26]} ^{[27]}.
 COAL  Program for computing gene tree probabilities and simulating gene trees in species trees under the coalescent model ^{[28]}.
 IBDSim  A computer package for the simulation of genotypic data under general isolation by distance models ^{[29]}.
References and notes
Articles
 ^ Arenas, M. and Posada, D. (2007) Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinformatics 8: 458
 ^ Arenas, M. and Posada, D. (2010) Coalescent simulation of intracodon recombination. Genetics 184(2): 429–437
 ^ Browning, S.R. (2006) Multilocus association mapping using variablelength markov chains. American Journal of Human Genetics 78:903–913
 ^ Degnan, JH and LA Salter. 2005. Gene tree distribtutions under the coalescent process. Evolution 59(1): 2437. pdf from coaltree.net/
 ^ Donnelly, P., Tavaré, S. (1995) Coalescents and genealogical structure under neutrality. Annual Review of Genetics 29:401–421
 ^ Hellenthal, G., Stephens M. (2006) msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots Bioinformatics AOP
 ^ Hudson RR (1983a) Testing the constantrate neutral allele model with protein sequence data. Evolution 37: 203–207 JSTOR copy
 ^ Hudson RR (1983b) Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23:183–201.
 ^ Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7: 1–44
 ^ Hudson RR (2002) Generating samples under a Wright–Fisher neutral model. Bioinformatics 18:337–338
 Hein, J. , Schierup, M., Wiuf C. (2004) Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory Oxford University Press ISBN 9780198529965
 ^ Kaplan, N.L., Darden, T., Hudson, R.R. (1988) The coalescent process in models with selection. Genetics 120:819–829
 ^ Kingman, J.F.C. (1982) On the Genealogy of Large Populations. Journal of Applied Probability 19A:27–43 JSTOR copy
 ^ Kingman, J.F.C. (2000) Origins of the coalescent 1974–1982. Genetics 156:1461–1463
 ^ Liang L., Zöllner S., Abecasis G.R. (2007) GENOME: a rapid coalescentbased whole genome simulator. Bioinformatics 23: 1565–1567
 ^ Mailund, T., Schierup, M.H., Pedersen, C.N.S., Mechlenborg, P.J.M., Madsen, J.N., Schauser, L. (2005) CoaSim: A Flexible Environment for Simulating Genetic Data under Coalescent Models BMC Bioinformatics 6:252
 ^ Möhle, M., Sagitov, S. (2001) A classification of coalescent processes for haploid exchangeable population models The Annals of Probability 29:1547–1562
 ^ Morris, A. P., Whittaker, J. C., Balding, D. J. (2002) Finescale mapping of disease loci via shattered coalescent modeling of genealogies American Journal of Human Genetics 70:686–707
 ^ Neuhauser, C., Krone, S.M. (1997) The genealogy of samples in models with selection Genetics 145 519–534
 ^ Pitman, J. (1999) Coalescents with multiple collisions The Annals of Probability 27:1870–1902
 ^ Harding, Rosalind, M. 1998. New phylogenies: an introductory look at the coalescent. Pp. 1522, in Harvey, P. H., Brown, A. J. L., Smith, J. M., Nee, S. New uses for new phylogenies. Oxford University Press (ISBN:0198549849)
 ^ Rosenberg, N.A., Nordborg, M. (2002) Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms. Nature Reviews Genetics 3:380–390
 ^ Sagitov, S. (1999) The general coalescent with asynchronous mergers of ancestral lines Journal of Applied Probability 36:1116–1125
 ^ Schweinsberg, J. (2000) Coalescents with simultaneous multiple collisions Electronic Journal of Probability 5:1–50
 ^ Slatkin, M. (2001) Simulating genealogies of selected alleles in populations of variable size Genetic Research 145:519–534
 ^ Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in finite populations. Genetics 105:437–460
 ^ Zöllner S. and Pritchard J.K. (2005) CoalescentBased Association Mapping and Fine Mapping of Complex Trait Loci Genetics 169:1071–1092
 ^ Rousset F. and Leblois R. (2007) Likelihood and Approximate Likelihood Analyses of Genetic Structure in a Linear Habitat: Performance and Robustness to Model MisSpecification Molecular Biology and Evolution 24:2730–2745
 ^ Leblois R., Estoup A. and Rousset F. (2009) IBDSim: a computer program to simulate genotypic data under isolation by distance Molecular Ecology Resources 9:107109
Books
 Hein, J; Schierup, M. H., and Wiuf, C. Gene Genealogies, Variation and Evolution – A Primer in Coalescent Theory. Oxford University Press, 2005. ISBN 0198529961.
 Nordborg, M. (2001) Introduction to Coalescent Theory
 Chapter 7 in Balding, D., Bishop, M., Cannings, C., editors, Handbook of Statistical Genetics. Wiley ISBN 9780471860945
 Wakeley J. (2006) An Introduction to Coalescent Theory Roberts & Co ISBN 0974707759 Accompanying website with sample chapters
 ^ Rice SH. (2004). Evolutionary Theory: Mathematical and Conceptual Foundations. Sinauer Associates: Sunderland, MA. See esp. ch. 3 for detailed derivations.
 Berestycki N. "Recent progress in coalescent theory" 2009 ENSAIOS Matematicos vol.16
 Bertoin J. "Random Fragmentation and Coagulation Processes"., 2006. Cambridge Studies in Advanced Mathematics, 102. Cambridge University Press, Cambridge, 2006. ISBN 9780521867283;
 Pitman J. "Combinatorial stochastic processes" Springer (2003)
External links
 EvoMath 3: Genetic Drift and Coalescence, Briefly — overview, with probability equations for genetic drift, and simulation graphs
Topics in population genetics Key concepts Selection Effects of selection
on genomic variationGenetic drift Small population size · Population bottleneck · Founder effect · Coalescence · Balding–Nichols modelFounders Related topics List of evolutionary biology topics Categories: Population genetics
 Statistical genetics
Wikimedia Foundation. 2010.
Look at other dictionaries:
Coalescent — This article is not about coalescent theory. Coalescent … Wikipedia
Neutral theory of molecular evolution — The neutral theory of molecular evolution states that the vast majority of evolutionary changes at the molecular level are caused by random drift of selectively neutral mutants (not affecting fitness).[1] The theory was introduced by Motoo Kimura … Wikipedia
Argumentation theory — Argumentation theory, or argumentation, embraces the arts and sciences of civil debate, dialogue, conversation, and persuasion; studying rules of inference, logic, and procedural rules in both artificial and real world settings. Argumentation is… … Wikipedia
Mitochondrial Eve — Haplogroup Modern humans Possible time of origin 152,000 234,000 BP [1] Possible place of origin East Africa Ancestor … Wikipedia
Most recent common ancestor — Part of a series on Genetic genealogy Concepts Population genetics Haplogroup/ Haplotype Most recent common ancestor Human mitochondrial DNA haplogroups Human Y chromosome DNA haplogroups Genomics Other Y chromosome haplogroups by po … Wikipedia
Clovis culture — … Wikipedia
Phylogeography — is the study of the historical processes that may be responsible for the contemporary geographic distributions of individuals. This is accomplished by considering the geographic distribution of individuals in light of the patterns associated with … Wikipedia
Ewens's sampling formula — In population genetics, Ewens sampling formula, introduced by Warren Ewens, states that under certain conditions (specified below), if a random sample of n gametes is taken from a population and classified according to the gene at a particular… … Wikipedia
Mathematical and theoretical biology — is an interdisciplinary scientific research field with a range of applications in biology, medicine and biotechnology.[1] The field may be referred to as mathematical biology or biomathematics to stress the mathematical side, or as theoretical… … Wikipedia
Population genetics — is the study of the allele frequency distribution and change under the influence of the four evolutionary forces: natural selection, genetic drift, mutation and gene flow. It also takes account of population subdivision and population structure… … Wikipedia