# Population stratification

﻿
Population stratification

Population stratification is the presence of a systematic difference in allele frequencies between subpopulations in a population possible due to different ancestry. Population stratification is also referred to as population structure.

Causes of population stratification

The most obvious cause of population stratification is migration where individuals from one population migrates into another population. After generations the population stratification will become less due to admixture. Another form of population stratification is spurious relatedness where non random mating causes a certain subpopulation to be more related with each other compared to the rest of the population.

Population stratification and association studies

Population stratification can be a problem for association studies, such as case-control studies, where the association found could be due to the underlying structure of the population and not a disease associated locus. Also the real disease causing locus might not be found in the study if the locus is less prevalent in the population where the case subjects are chosen. Therefore it is oftenpreferable to use family based data where the effect of population stratification can easilybe controlled for. But if the structure is known or a putative structure is found there is anumber of possible ways to implement this structure in the association studies and thuscompensating for any population bias. Another possibility is using unlinked markersto control the possible inflation of the number of false positives. This is known as genomic control.

Genomic Control

The assumption of population homogeneity in association studies, especially case-controlstudies, can easily be violated and can lead to both type I and type II errors. It istherefore important for the models used in the study to compensate for the populationstructure. The problem in case control studies is that if there is a genetic involvement inthe disease the case population is more likely to be related than the individuals in thecontrol population. This means that the assumption of independence of observations isviolated. Often this will lead to an overestimation of the significance of an associationbut it depends on the way the sample was chosen. As long as there is a higher allelefrequency in a subpopulation you will find association with any trait more prevalentin the case population [Lander, E. S. and Schork, N. J. (1994). Genetic dissection of complex traits, Science 265(5181): 2037–2048.] . This kind of spurious associationincreases as the sample population grows so the problem should be of special concern inlarge scale association studies when loci only cause relatively small effects on the trait. A method that in some cases can compensate for the above described problems has been developed by Devlin andRoeder (1999) [Devlin, B. and Roeder, K. (1999). Genomic control for association studies, Biometrics55(4): 997–1004.] . It uses both a frequentist and a Bayesian approach. The latter beingappropriate when dealing with a large number of candidate genes. Here is a short description of how the frequentist way of correcting for population stratification works. It work by using markers that are not linked with the trait in question to correctfor any inflation of the statistic caused by population stratification. The method wasfirst developed for binary traits but has since been generalized for quantitative ones [Bacanu, S.-A., Devlin, B. and Roeder, K. (2002). Association studies for quantitativetraits in structured populations, Genet Epidemiol 22(1): 78–93.] . For the binary one, which applies to finding genetic differencesbetween the case and control populations, Devlin and Roeder (1999) uses Armitage’strend test

$Y^2=frac\left\{N\left(N\left(r_1+2r_2\right)-R\left(n_1+2n_2\right)\right)^2\right\}\left\{N\left(n_1 + 4n_2\right) - \left(n_1 + 2n_2\right)^2\right)\right\}$

and the $chi^2$ test for allelic frequencies

$chi^2sim X_A^2 = frac\left\{2N \left(2N\left(r_1 + 2r_2\right) - R\left(n_1 + 2n_2\right)\right)^2\right\}\left\{4R\left(N - R\right) \left(2N\left(n_1 + 2n_2\right) - \left(n_1 + 2n_2\right)^2\right)\right\}$

If the population is in Hardy-Weinberg equilibrium the two statistics are approximatelyequal. Under the null hypothesis of no population stratification the trend test isasymptotic $chi^2$ distribution with one degree of freedom.The idea is that the statistic is inflated by a factor $lambda$ so that $Y^2simlambdachi_1^2$ where $lambda$ depends on the effect of stratification. The above method rests upon the assumption that the inflationfactor $lambda$ is constant, which means that the loci should have roughly equal mutationrates, should not be under different selection in the two populations, and the amount ofHardy-Weinberg disequilibrium measured in Wright’s coefficient of inbreeding F shouldnot differ between the different loci. The latter being of greatest concern. If the effect ofthe stratification is similar across the different loci $lambda$ can be estimated from the unlinkedmarkers$hat\left\{lambda\right\}= median\left(Y_1^2,Y_2^2,ldots Y_L^2\right)/0.456$where L is the number of unlinked markers. The denominatoris derived from the gamma distribution as a robust estimator of $lambda$. Other estimatorshave been suggested, for example, [Reich, D. E. and Goldstein, D. B. (2001). Detecting association in a case-control studywhile correcting for population stratification, Genet Epidemiol 20(1): 4–16.] suggested using the meanof the statistics instead.This is not the only way to estimate $lambda$ but according to [Bacanu, S. A., Devlin, B. and Roeder, K. (2000). The power of genomic control, Am JHum Genet 66(6): 1933–1944.] it is anappropriate estimate even if some of the unlinked markers are actually in disequilibriumwith a disease causing locus or are themselves associated with the disease. Under thenull hypothesis and when correcting for stratification using L unlinked genes, $Y^2$ isapproximately $chi^2_1$distributed. With this correction theoverall type I error rate should be approximately equal to $alpha$ even when the populationis stratified.Devlin and Roeder (1999) [Devlin, B. and Roeder, K. (1999). Genomic control for association studies, Biometrics55(4): 997–1004.] mostly considered the situation where $alpha=0.05$ gives a95% confidence level and not smaller p-values. Marchini et al. (2004) [Marchini, J., Cardon, L. R., Phillips, M. S. and Donnelly, P. (2004). The effects of humanpopulation structure on large genetic association studies, Nat Genet 36(5): 512–517.] demonstrates bysimulation that genomic control can lead to an anti-conservative p-value if this valueis very small and the two populations (case and control) are extremely distinct. Thiswas especially a problem if the number of unlinked markers were in the order 50 − 100.This can result in false positives (at that significance level).

Notes & references

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Population structure — may refer to many aspects of population ecology:* Population stratification * Population pyramid * Age class structure * F statistics * Population density * Population distribution * Population dynamics and population growth * Population genetics …   Wikipedia

• Population groups in biomedicine — Biomedical researchers subdivide populations into groups with the goal of improving the prevention and treatment of diseases. Many studies have found that disease susceptibility and environmental responses vary among U.S. ethnicities, among New… …   Wikipedia

• stratification — [ stratifikasjɔ̃ ] n. f. • 1620 chim.; lat. des alchim. stratificatio, onis 1 ♦ Géol. Disposition des matériaux par strates (dans les terrains sédimentaires); processus géologique par lequel les matériaux se sont ainsi disposés. Stratification… …   Encyclopédie Universelle

• Population health — is an approach to health that aims to improve the health of an entire population. One major step in achieving this aim is to reduce health inequities among population groups. Population health seeks to step beyond the individual level focus of… …   Wikipedia

• POPULATION — THE JEWISH POPULATION Growth by Aliyah In 1882 the Jewish population of Ereẓ Israel numbered some 24,000, roughly 5% of the total, and about 0.3% of the world Jewish population. Since then there has been an almost continuous flow of aliyah, which …   Encyclopedia of Judaism

• Social stratification — Sociology …   Wikipedia

• Class stratification — is a form of social stratification in which a society tends to divide into separate classes whose members have differential access to resources and power. An economic and cultural rift usually exists between different classes. People are usually… …   Wikipedia

• Social and economic stratification in Appalachia — The Appalachian region of the Eastern United States is home to over 20 million people and covers parts of mostly mountainous areas of 13 states, including Mississippi, Alabama, Pennsylvania, New York, Georgia, South Carolina, North Carolina,… …   Wikipedia

• Medical genetics — Clinical genetics redirects here. For the journal, see Clinical Genetics (journal). For a non technical introduction to the topic, see Introduction to Genetics. Part of a series on Genetics Key components Chromosome …   Wikipedia

• Raza (clasificación de los seres humanos) — Véase también: Raza Para otros usos de este término, véase Clasificación histórica en razas humanas El término raza es utilizado para hacer definir grupos con características hereditarias comunes en los que se subdividen algunas especies animales …   Wikipedia Español