- Fisher's noncentral hypergeometric distribution

right|thumb|300px">

Probability mass function for Fisher's noncentral hypergeometric distribution for different values of the odds ratio ω.

"m"_{1}= 80, "m"_{2}= 60, "n" = 100, ω = 0.01, ..., 1000In

probability theory andstatistics ,**Fisher's noncentral hypergeometric distribution**is a generalization of thehypergeometric distribution where sampling probabilities are modified by weight factors. Fisher's noncentral hypergeometric distribution can also be defined as the conditional distribution of two or more binomially distributed variables dependent upon their fixed sum.The distribution may be illustrated by the following urn model. Assume, for example, that an urn contains "m"

_{1}red balls and "m"_{2}white balls, totalling "N" = "m"_{1}+ "m"_{2}balls. Each red ball has the weight ω_{1}and each white ball has the weight ω_{2}. We will say that the odds ratio is ω = ω_{1}/ ω_{2}. Now we are taking balls randomly in such a way that the probability of taking a particular ball is proportional to its weight, but independent of what happens to the other balls. The number of balls taken of a particular color follows thebinomial distribution . If the total number "n" of balls taken is known then the conditional distribution of the number of taken red balls for given "n" is Fisher's noncentral hypergeometric distribution. To generate this distribution experimentally, we have to repeat the experiment until it happens to give "n" balls.If we want to fix the value of "n" prior to the experiment then we have to take the balls one by one until we have "n" balls. The balls are therefore no longer independent. This gives a slightly different distribution known as

Wallenius' noncentral hypergeometric distribution . It is far from obvious why these two distributions are different. See the entry fornoncentral hypergeometric distributions for an explanation of the difference between these two distributions and a discussion of which distribution to use in various situations.The two distributions are both equal to the (central)

hypergeometric distribution when the odds ratio is 1.Unfortunately, both distributions are known in the literature as "the" noncentral hypergeometric distribution. It is important to be specific about which distribution is meant when using this name.

Fisher's noncentral hypergeometric distribution was first given the name

**extended hypergeometric distribution**(Harkness, 1965), but this name is rarely used today.**Univariate distribution**Probability distribution

name =Univariate Fisher's noncentral hypergeometric distribution

type =mass

pdf_

cdf_

parameters =$m\_1,\; m\_2\; in\; mathbb\{N\}$

$N\; =\; m\_1\; +\; m\_2$

$n\; in\; [0,N)$

$omega\; in\; mathbb\{R\}\_+$

support =$x\; in\; [x\_min,x\_max]$

$x\_min=max(0,n-m\_2)$

$x\_max=min(n,m\_1)$

pdf =$frac\{inom\{m\_1\}\{x\}\; inom\{m\_2\}\{n-x\}\; omega^x\}\{P\_0\}$

where $P\_0\; =\; sum\_\{y=x\_min\}^\{x\_max\}\; inom\{m\_1\}\{y\}\; inom\{m\_2\}\{n-y\}\; omega^y$

cdf =

mean =$frac\{P\_1\}\{P\_0\}$, where $P\_k\; =\; sum\_\{y=x\_min\}^\{x\_max\}\; inom\{m\_1\}\{y\}\; inom\{m\_2\}\{n-y\}\; omega^y,\; y^k$

median =

mode =$,,\; leftlfloor\; frac\{-2C\}\{B\; -\; sqrt\{B^2-4AC$ ight floor , , where $A=omega-1$, $B\; =\; m\_1\; +\; n\; -\; N\; -(m\_1+n+2)omega$, $C\; =\; (m\_1+1)(n+1)omega$.

variance =$frac\{P\_2\}\{P\_0\}\; -\; left(\; frac\{P\_1\}\{P\_0\}\; ight)^2$, where "P"_{"k"}is given above.

skewness =

kurtosis =

entropy =

mgf =

char =The probability function, mean and variance are given in the table to the right.

An alternative expression of the distribution has both the number of balls taken of each color and the number of balls not taken as random variables, whereby the expression for the probability becomes symmetric.

The calculation time for the probability function can be high when the sum in "P"

_{0}has many terms. The calculation time can be reduced by calculating the terms in the sum recursively relative to the term for "y" = "x" and ignoring negligible terms in the tails (Liao and Rosen, 2001).The mean can be approximated by::$mu\; approx\; frac\{-2c\}\{b\; -\; sqrt\{b^2-4ac\; ,$ ,where $a=omega-1$, $b=m\_1\; +\; n\; -\; N\; -(m\_1+n)omega$, $c=m\_1\; n\; omega$.

The variance can be approximated by::$sigma^2\; approx\; frac\{N\}\{N-1\}\; igg/\; left(\; frac\{1\}\{mu\}+\; frac\{1\}\{m\_1-mu\}+\; frac\{1\}\{n-mu\}+\; frac\{1\}\{mu+m\_2-n\}\; ight)$ .

Better approximations to the mean and variance are given by Levin (1984), Liao (1992), McCullagh and Nelder (1989).

**Properties**The following symmetry relations apply:

:$operatorname\{fnchypg\}(x;n,m\_1,N,omega)\; =\; operatorname\{fnchypg\}(n-x;n,m\_2,N,1/omega),.$

:$operatorname\{fnchypg\}(x;n,m\_1,N,omega)\; =\; operatorname\{fnchypg\}(x;m\_1,n,N,omega),.$

:$operatorname\{fnchypg\}(x;n,m\_1,N,omega)\; =\; operatorname\{fnchypg\}(m\_1-x;N-n,m\_1,N,1/omega),.$

Recurrence relation:

:$operatorname\{fnchypg\}(x;n,m\_1,N,omega)\; =\; operatorname\{fnchypg\}(x-1;n,m\_1,N,omega)\; frac\{(m\_1-x+1)(n-x+1)\}\{x(m\_2-n+x)\}omega,.$

**Multivariate distribution**Probability distribution

name =Multivariate Fisher's Noncentral Hypergeometric Distribution

type =mass

pdf_

cdf_

parameters =$c\; in\; mathbb\{N\}$

$mathbf\{m\}=(m\_1,ldots,m\_c)\; in\; mathbb\{N\}^c$

$N\; =\; sum\_\{i=1\}^c\; m\_i$

$n\; in\; [0,N)$

$\backslash boldsymbol\{omega\}\; =\; (omega\_1,ldots,omega\_c)\; in\; mathbb\{R\}\_+^c$

support =$mathrm\{S\}\; =\; left\{\; mathbf\{x\}\; in\; mathbb\{Z\}\_\{0+\}^c\; ,\; :\; ,\; sum\_\{i=1\}^\{c\}\; x\_i\; =\; n\; ight\}$

pdf =$frac\{1\}\{P\_0\}prod\_\{i=1\}^\{c\}\; inom\{m\_i\}\{x\_i\}omega\_i^\{x\_i\}$

where $P\_0\; =\; sum\_\{(y\_0,ldots,y\_c)in\; mathrm\{S$prod_{i=1}^{c} inom{m_i}{y_i}omega_i^{y_i}

cdf =

mean =The mean μ_{i}of "x"_{i}can be approximated by

$mu\_i\; =\; frac\{m\_i\; r\; omega\_i\}\{r\; omega\_i\; +\; 1\}$ where "r" is the unique positive solution to $sum\_\{i=1\}^\{c\}mu\_i\; =\; n,$.

median =

mode =

variance =

skewness =

kurtosis =

entropy =

mgf =

char =The distribution can be expanded to any number of colors "c" of balls in the urn. The multivariate distribution is used when there are more than two colors.

The probability function and a simple approximation to the mean are given to the right. Better approximations to the mean and variance are given by McCullagh and Nelder (1989).

**Properties**The order of the colors is arbitrary so that any colors can be swapped.

The weights can be arbitrarily scaled:

:$operatorname\{mfnchypg\}(mathbf\{x\};n,mathbf\{m\},\; \backslash boldsymbol\{omega\})\; =\; operatorname\{mfnchypg\}(mathbf\{x\};n,mathbf\{m\},\; r\backslash boldsymbol\{omega\}),,$ for all $r\; in\; mathbb\{R\}\_+.$

Colors with zero number ("m"

_{"i"}= 0) or zero weight (ω_{"i"}= 0) can be omitted from the equations.Colors with the same weight can be joined:

:$egin\{align\}\; \{\}\; operatorname\{mfnchypg\}left(mathbf\{x\};n,mathbf\{m\},\; (omega\_1,ldots,omega\_\{c-1\},omega\_\{c-1\})\; ight)\; \backslash \backslash \; \{\}\; =\; operatorname\{mfnchypg\}left((x\_1,ldots,x\_\{c-1\}+x\_c);\; n,(m\_1,ldots,m\_\{c-1\}+m\_c),\; (omega\_1,ldots,omega\_\{c-1\})\; ight),\; cdot\; \backslash \backslash \; qquad\; operatorname\{hypg\}(x\_c;\; x\_\{c-1\}+x\_c,\; m\_c,\; m\_\{c-1\}+m\_c)end\{align\}$

where $operatorname\{hypg\}(x;n,m,N)$ is the (univariate, central) hypergeometric distribution probability.

**Applications**Fisher's noncentral hypergeometric distribution is useful for models of biased sampling or biased selection where the individual items are sampled independently of each other with no competition. The bias or odds can be estimated from an experimental value of the mean. Use

Wallenius' noncentral hypergeometric distribution instead if items are sampled one by one with competition.Fisher's noncentral hypergeometric distribution is used mostly for tests in

contingency table s where a conditional distribution for fixed margins is desired. This can be useful e.g. for testing or measuring the effect of a medicine. See McCullagh and Nelder (1989).**oftware available*** An implementation for the R programming language is available as the package named [

*http://cran.stat.ucla.edu/src/contrib/Descriptions/BiasedUrn.html BiasedUrn*] . Includes univariate and multivariate probability mass functions, distribution functions,quantile s,random variable generating functions, mean and variance.

* The R package [*http://mcmcpack.wustl.edu/wiki/index.php/Main_Page MCMCpack*] includes the univariate probability mass function and random variable generating function.

*SAS System includes univariate probability mass function and distribution function.

* Implementation inC++ is available from [*http://www.agner.org/random/ www.agner.org*] .

* Calculation methods are described by Liao and Rosen (2001) and Fog (2008).**ee also***

Noncentral hypergeometric distributions

*Wallenius' noncentral hypergeometric distribution

*Hypergeometric distribution

* Urn models

*Biased sample

* Bias

*Contingency table

*Fisher's exact test **References**Citation

last=Johnson

first=N. L.

last2=Kemp

first2=A. W.

last3=Kotz

first3=S.

author-link=

year=2005

title=Univariate Discrete Distributions

publisher=Wiley and Sons

place=Hoboken, New Jersey.Citation

last=McCullagh

first=P.

last2=Nelder

first2=J. A.

year=1989

title=Generalized Linear Models, 2. ed.

publisher=Chapman and Hall

place=London.Citation

last=Breslow

first=N. E.

last2=Day

first2=N. E.

year=1980

title=Statistical Methods in Cancer Research

publisher=International Agency for Research on Cancer

place=Lyon.Citation

last=Fog

first=A.

year=2007

title=Random number theory

url=http://www.agner.org/random/theory/.Citation

last=Fog

first=A.

year=2008

title=Sampling Methods for Wallenius' and Fisher's Noncentral Hypergeometric Distributions

periodical=Communications In statictics, Simulation and Computation

volume=37

issue=2

pages=241-257.Citation

last=Liao

first=J. G.

last2=Rosen

first2=O.

year=2001

title=Fast and Stable Algorithms for Computing and Sampling from the Noncentral Hypergeometric Distribution

periodical=The American Statistician

volume=55

issue=4

pages=366-369.Citation

last=Liao

first=J.

year=1992

title=An Algorithm for the Mean and Variance of the Noncentral Hypergeometric Distribution

periodical=Biometrics

volume=48

issue=3

pages=889-892.Citation

last=Levin

first=B.

year=1984

title=Simple Improvements on Cornfield's approximation to the mean of a noncentral Hypergeometric random variable

periodical=Biometrika

volume=71

issue=3

pages=630-632.

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Wallenius' noncentral hypergeometric distribution**— Introduction Probability mass function for Wallenius Noncentral Hypergeometric Distribution for different values of the odds ratio ω. m1 = 80, m2 = 60, n = 100, ω = 0.1 ... 20In probability theory and statistics, Wallenius noncentral… … Wikipedia**Noncentral hypergeometric distributions**— In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement. Various generalizations to this distribution exist for cases where the picking… … Wikipedia**Hypergeometric distribution**— Hypergeometric parameters: support: pmf … Wikipedia**Noncentral t-distribution**— Noncentral Student s t Probability density function parameters: degrees of freedom noncentrality parameter support … Wikipedia**Noncentral F-distribution**— In probability theory and statistics, the noncentral F distribution is a continuous probability distribution that is a generalization of the (ordinary) F distribution. It describes the distribution of the quotient (X/n1)/(Y/n2), where the… … Wikipedia**Noncentral chi-squared distribution**— Noncentral chi squared Probability density function Cumulative distribution function parameters … Wikipedia**Chi-squared distribution**— This article is about the mathematics of the chi squared distribution. For its uses in statistics, see chi squared test. For the music group, see Chi2 (band). Probability density function Cumulative distribution function … Wikipedia**Probability distribution**— This article is about probability distribution. For generalized functions in mathematical analysis, see Distribution (mathematics). For other uses, see Distribution (disambiguation). In probability theory, a probability mass, probability density … Wikipedia**Multinomial distribution**— Multinomial parameters: n > 0 number of trials (integer) event probabilities (Σpi = 1) support: pmf … Wikipedia**Discrete phase-type distribution**— The discrete phase type distribution is a probability distribution that results from a system of one or more inter related geometric distributions occurring in sequence, or phases. The sequence in which each of the phases occur may itself be a… … Wikipedia