Binomial distribution

Binomial distribution

Probability distribution
name =Binomial
type =mass


Colors match the image above
parameters =n geq 0 number of trials (integer)
0leq p leq 1 success probability (real)
support =k in {0,dots,n}!
pdf ={nchoose k} p^k (1-p)^{n-k} !
cdf =I_{1-p}(n-lfloor k floor, 1+lfloor k floor) !
mean =np!
median =one of {lfloor np floor, lceil np ceil} [Hamza, K. (1995). The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions. Statist. Probab. Lett. 23 21–25.]
mode =lfloor (n+1),p floor!
variance =np(1-p)!
skewness =frac{1-2p}{sqrt{np(1-p)
kurtosis =frac{1-6p(1-p)}{np(1-p)}!
entropy = frac{1}{2} ln left( 2 pi n e p (1-p) ight) + O left( frac{1}{n} ight)
mgf =(1-p + pe^t)^n !
char =(1-p + pe^{it})^n !

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of "n" independent yes/no experiments, each of which yields success with probability "p". Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when "n" = 1, the binomial distribution "is" a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance. A binomial distribution should not be confused with a bimodal distribution.


An elementary example is this: Roll a standard die ten times and count the number of sixes. The distribution of this random number is a binomial distribution with "n" = 10 and "p" = 1/6.

As another example, assume 5% of a very large population to be green-eyed. You pick 100 people randomly. The number of green-eyed people you pick is a random variable "X" which follows a binomial distribution with "n" = 100 and "p" = 0.05.


Probability mass function

In general, if the random variable "K" follows the binomial distribution with parameters "n" and "p", we write "K" ~ B("n", "p"). The probability of getting exactly "k" successes in "n" trials is given by the probability mass function:

: Pr(K = k) = f(k;n,p) : Pr(K = k) = {nchoose k}p^k(1-p)^{n-k}

for "k" = 0, 1, 2, ..., "n" and where

:{nchoose k}=frac{n!}{k!(n-k)!}

is the binomial coefficient (hence the name of the distribution) "n" choose "k" (also denoted "C"("n", "k") or "n"C"k"). The formula can be understood as follows: we want "k" successes ("p""k") and "n" − "k" failures (1 − "p")"n" − "k". However, the "k" successes can occur anywhere among the "n" trials, and there are C("n", "k") different ways of distributing "k" successes in a sequence of "n" trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to "n"/2 values. This is because for "k" > "n"/2, the probability can be calculated by its complement as


So, one must look to a different "k" and a different "p" (the binomial is not symmetrical in general). However, its behavior is not arbitrary. There is always an integer "m" that satisfies

:(n+1)p-1 < m leq (n+1)p.

As a function of "k", the expression &fnor;("k"; "n", "p") is monotone increasing for "k" < "m" and monotone decreasing for "k" > "m", with the exception of one case where ("n" + 1)"p" is an integer. In this case, there are two maximum values for "m" = ("n" + 1)"p" and "m" − 1. "m" is known as the "most probable" ("most likely") outcome of Bernoulli trials. Note that the probability of it occurring can be fairly small.

Cumulative distribution function

The cumulative distribution function can be expressed as:

:F(x;n,p) = Pr(X le x) = sum_{i=0}^{lfloor x floor} {nchoose i}p^i(1-p)^{n-i}.

where scriptstyle lfloor x floor, is the "floor" under "x", i.e. the greatest integer less than or equal to "x".

It can also be represented in terms of the regularized incomplete beta function, as follows:

: F(k;n,p) = Pr(X le k) = I_{1-p}(n-k, k+1) = (n-k) {n choose k} int_0^{1-p} t^{n-k-1} (1-t)^k dt !

For "k" ≤ "np", upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

: F(k;n,p) leq expleft(-2 frac{(np-k)^2}{n} ight), !

and Chernoff's inequality can be used to derive the bound

: F(k;n,p) leq expleft(-frac{1}{2,p} frac{(np-k)^2}{n} ight). !

Mean, variance, and mode

If "X" ~ B("n", "p") (that is, "X" is a binomially distributed random variable), then the expected value of "X" is


and the variance is


This fact is easily proven as follows. Suppose first that we have exactly one Bernoulli trial. We have two possible outcomes, 1 and 0, with the first having probability "p" and the second having probability 1 − "p"; the mean for this trial is given by μ = "p". Using the definition of variance, we have

:sigma^2= left(1 - p ight)^2p + (0-p)^2(1 - p) = p(1-p).

Now suppose that we want the variance for "n" such trials (i.e. for the general binomial distribution). Since the trials are independent, we may add the variances for each trial, giving

:sigma^2_n = sum_{k=1}^n sigma^2 = np(1 - p). quad

The mode of "X" is the greatest integer less than or equal to ("n" + 1)"p"; if "m" = ("n" + 1)"p" is an integer, then "m" − 1 and "m" are both modes.

Algebraic derivations of mean and variance

We derive these quantities from first principles. Certain particular sums occur in these two derivations.We rearrange the sums and terms so that sums solely over complete binomial probability mass functions(pmf) arise, which are always unity

: sum_{k=0}^n operatorname{Pr}(X=k) = sum_{k=0}^n {nchoose k}p^k(1-p)^{n-k} = 1.

We apply the definition of the expected value of a discrete random variable to the binomial distribution

:operatorname{E}(X) = sum_k x_k cdot operatorname{Pr}(x_k) = sum_{k=0}^n k cdot operatorname{Pr}(X=k)

= sum_{k=0}^n k cdot {nchoose k}p^k(1-p)^{n-k}.

The first term of the series (with index "k" = 0) has value 0 since the first factor, "k", is zero.It may thus be discarded, i.e. we can change the lower limit to: "k" = 1

:operatorname{E}(X) = sum_{k=1}^n k cdot frac{n!}{k!(n-k)!} p^k(1-p)^{n-k}

= sum_{k=1}^n k cdot frac{ncdot(n-1)!}{kcdot(k-1)!(n-k)!} cdot p cdot p^{k-1}(1-p)^{n-k}.

We've pulled factors of "n" and "k" out of the factorials, and one power of "p" has been split off. We are preparing to redefine the indices.

:operatorname{E}(X) = np cdot sum_{k=1}^n frac{(n-1)!}{(k-1)!(n-k)!} p^{k-1}(1-p)^{n-k}

We rename "m" = "n" − 1 and "s" = "k" − 1. The value of the sum is not changed by this, but it now becomes readily recognizable

:operatorname{E}(X) = np cdot sum_{s=0}^m frac{(m)!}{(s)!(m-s)!} p^s(1-p)^{m-s}

= np cdot sum_{s=0}^m {mchoose s} p^s(1-p)^{m-s}.

The ensuing sum is a sum over a complete binomial pmf (of one order lower than the initial sum, as it happens). Thus

:operatorname{E}(X) = np cdot 1 = np.

[Professor Phillip M. Morse, Thermal physics, W. A. Benjamin, inc , New York ]


It can be shown that the variance is equal to(see: variance, 10. Computational formula for variance):

:operatorname{Var}(X) = operatorname{E}(X^2) - (operatorname{E}(X))^2.

In using this formula we see that we now also need the expected value of "X"2, which is

:operatorname{E}(X^2) = sum_{k=0}^n k^2 cdot operatorname{Pr}(X=k)

= sum_{k=0}^n k^2 cdot {nchoose k}p^k(1-p)^{n-k}.

We can use our experience gained above in deriving the mean. We know how to process one factor of "k". This gets us as far as

:operatorname{E}(X^2) = np cdot sum_{s=0}^m k cdot {mchoose s} p^s(1-p)^{m-s}= np cdot sum_{s=0}^m (s+1) cdot {mchoose s} p^s(1-p)^{m-s}

(again, with "m" = "n" - 1 and "s" = "k" - 1). We split the sum into two separate sums and we recognize each one

:operatorname{E}(X^2) = np cdot igg( sum_{s=0}^m s cdot {mchoose s} p^s(1-p)^{m-s} + sum_{s=0}^m 1 cdot {mchoose s} p^s(1-p)^{m-s} igg).

The first sum is identical in form to the one we calculated in the Mean (above). It sums to "mp".The second sum is unity.

:operatorname{E}(X^2) = np cdot ( mp + 1) = np((n-1)p + 1) = np(np - p + 1).

Using this result in the expression for the variance, along with the Mean (E("X") = "np"), we get

:operatorname{Var}(X) = operatorname{E}(X^2) - (operatorname{E}(X))^2 = np(np - p + 1) - (np)^2 = np(1-p).

Using falling factorials to find E("X"2)

We have

:operatorname{E}(X^2) = sum_{k=0}^n k^2 cdot operatorname{Pr}(X=k)= sum_{k=0}^n k^2 cdot {nchoose k}p^k(1-p)^{n-k}.


:k^2= k(k - 1) + k.,


: egin{align}operatorname{E}(X^2) & = sum_{k=0}^n (k(k - 1)+ k) cdot {nchoose k}p^k(1-p)^{n-k} \& = sum_{k=0}^n k ( k - 1 ) {nchoose k}p^k(1-p)^{n-k} + sum_{k=0}^n k {nchoose k}p^k(1-p)^{n-k} \& = sum_{k=2}^n k ( k - 1 ) {nchoose k}p^k(1-p)^{n-k} + sum_{k=1}^n k {nchoose k}p^k(1-p)^{n-k} \& = sum_{k=2}^n n ( n - 1 ) {n -2choose k - 2}p^k(1-p)^{n-k} + sum_{k=1}^n n {n - 1 choose k - 1} p^k (1-p)^{n-k} \& = sum_{k=0}^{n-2} n ( n - 1 ) {n -2choose k}p^{k+2}(1-p)^{(n-2)-k} + sum_{k=0}^{n-1} n {n - 1 choose k} p^{k+1} (1-p)^{(n-1)-k} \& = n(n-1)p^2 underbrace{sum_{k=0}^{n-2} {n - 2 choose k} p^k (1 - p)^{(n-2)-k {} + np underbrace{ sum_{k=0}^{n-1} {n - 1 choose k} p^k (1-p)^{(n-1)-k \& = n(n-1)p^2cdotunderbrace}quad 1quad{ quad+quad npcdotunderbrace}quad 1quad{ \& = n^2p^2 - np^2 + np.end{align}


:operatorname{Var}(X) = operatorname{E}(X^2) - (operatorname{E}(X))^2= (n^2p^2 - np^2 + np) - n^2p^2 = np(1 - p).

Relationship to other distributions

ums of binomials

If "X" ~ B("n", "p") and "Y" ~ B("m", "p") are independent binomial variables, then "X" + "Y" is again a binomial variable; its distribution is

:X+Y sim B(n+m, p).,

Normal approximation

If "n" is large enough, the skew of the distribution is not too great, and a suitable continuity correction is used, then an excellent approximation to B("n", "p") is given by the normal distribution

: operatorname{N}(np, np(1-p)).,!

Various rules of thumb may be used to decide whether "n" is large enough. One rule is that both "np" and "n"(1 − "p") must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10. Another commonly used rule holds that the above normal approximation is appropriate only if

:mu pm 3 sigma = np pm 3 sqrt{np(1-p)} in [0,n] .

The following is an example of applying a continuity correction: Suppose one wishes to calculate Pr("X" ≤ 8) for a binomial random variable "X". If "Y" has a distribution given by the normal approximation, then Pr("X" ≤ 8) is approximated by Pr("Y" ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation is a huge time-saver (exact calculations with large "n" are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book "The Doctrine of Chances" in 1733. Nowadays, it can be seen as a consequence of the central limit theorem since B("n", "p") is a sum of "n" independent, identically distributed Bernoulli variables with parameter "p".

For example, suppose you randomly sample "n" people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups of "n" people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion "p" of agreement in the population and with standard deviation σ = ("p"(1 − "p")"n")1/2. Large sample sizes "n" are good because the standard deviation, as a proportion of the expected value, gets smaller, which allows a more precise estimate of the unknown parameter "p".

Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product "np" remains fixed. Therefore the Poisson distribution with parameter λ = "np" can be used as an approximation to B("n", "p") of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if "n" ≥ 20 and "p" ≤ 0.05, or if "n" ≥ 100 and "np" ≤ 10. [NIST/SEMATECH, ' Counts Control Charts', "e-Handbook of Statistical Methods", [accessed 25 October 2006] ]

Limits of binomial distributions

* As "n" approaches ∞ and "p" approaches 0 while "np" remains fixed at λ > 0 or at least "np" approaches λ > 0, then the Binomial("n", "p") distribution approaches the Poisson distribution with expected value λ.

* As "n" approaches ∞ while "p" remains fixed, the distribution of

::{X-np over sqrt{np(1-p)

:approaches the normal distribution with expected value 0 and variance 1 (this is just a specific case of the Central Limit Theorem).

* Luc Devroye, "Non-Uniform Random Variate Generation", New York: Springer-Verlag, 1986. "See especially [ Chapter X, Discrete Univariate Distributions] ."

* Voratas Kachitvichyanukul and Bruce W. Schmeiser, Binomial random variate generation, "Communications of the ACM" 31(2):216–222, February 1988. doi|10.1145/42372.42381

ee also

*Bean machine / Galton box
*Beta distribution
*Hypergeometric distribution
*Multinomial distribution
*Negative binomial distribution
*Poisson distribution
*Normal distribution
*Binomial proportion confidence interval


External links

* [ Web Based Binomial Probability Distribution Calculator] (does not require java)
* [ Binomial Probability Distribution Calculator] (requires java)
* [ Binomial Probabilities Simple Explanation]
* [ SOCR Binomial Distribution Applet]
* [] Many resources for teaching Statistics including Binomial Distribution
* [ "Binomial Distribution"] by Chris Boucher, The Wolfram Demonstrations Project, 2007.
* [ Binomial Distribution] Properties and Java simulation from cut-the-knot

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Binomial distribution — binomial distribution …   Dictionary of sociology

  • binomial distribution — n. Statistics the distribution of the probability of a specified number of successes in a given number of independent trials, in each of which the probability of success is the same …   English World dictionary

  • binomial distribution — Statistics. a distribution giving the probability of obtaining a specified number of successes in a finite set of independent trials in which the probability of a success remains the same from trial to trial. Also called Bernoulli distribution.… …   Universalium

  • Binomial Distribution — A probability distribution that summarizes the likelihood that a value will take one of two independent values under a given set of parameters or assumptions. The underlying assumptions of the binomial distribution are that there is only one… …   Investment dictionary

  • binomial distribution — A probability distribution for the occurrence of a specific event which either occurs or not such as winning a race. The binomial distribution is symmetrical (like the normal distribution ), in certain cases, but is otherwise skewed. See also …   Dictionary of sociology

  • binomial distribution — binominis skirstinys statusas T sritis biomedicinos mokslai apibrėžtis ↑Dichotomine matavimų skale matuojamų ↑kintamųjų reikšmių ↑skirstinys. Šis skirstinys apibūdinamas parametru p, reiškiančiu kintamojo tikimybę įgauti vieną iš dviejų galimų… …   Lithuanian dictionary (lietuvių žodynas)

  • binomial distribution — the probability distribution that describes the frequencies of the different possible combinations of two alternative outcomes in a series of n independent trials; it is given by expansion of the binomial (p + q)n, where one of the two… …   Medical dictionary

  • binomial distribution — noun a theoretical distribution of the number of successes in a finite set of independent trials with a constant probability of success (Freq. 8) • Syn: ↑Bernoulli distribution • Topics: ↑statistics • Hypernyms: ↑distribution, ↑ …   Useful english dictionary

  • binomial distribution — noun The discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p …   Wiktionary

  • binomial distribution — noun Statistics a frequency distribution of the possible number of successful outcomes in a given number of trials in each of which there is the same probability of success …   English new terms dictionary

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.