Chebyshev's inequality

Chebyshev's inequality

In probability theory, Chebyshev’s inequality (also spelled as Tchebysheff’s inequality) guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean. The inequality has great utility because it can be applied to completely arbitrary distributions (unknown except for mean and variance), for example it can be used to prove the weak law of large numbers.

The theorem is named after Russian mathematician Pafnuty Chebyshev, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé.[1] It can be stated quite generally using measure theory; the statement in the language of probability theory then follows as a particular case, for a space of measure 1.

The term Chebyshev’s inequality may also refer to the Markov's inequality, especially in the context of analysis.



Measure-theoretic statement

Let (X, Σ, μ) be a measure space, and let f be an extended real-valued measurable function defined on X. Then for any real number t > 0,

\mu(\{x\in X\,:\,\,|f(x)|\geq t\}) \leq {1\over t^2} \int_X |f|^2 \, d\mu.

More generally, if g is an extended real-valued measurable function, nonnegative and nondecreasing on the range of f, then

\mu(\{x\in X\,:\,\,f(x)\geq t\}) \leq {1\over g(t)} \int_X g\circ f\, d\mu.

The previous statement then follows by defining g(t) as

g(t)=\begin{cases}t^2&\text{if }t\geq0\\

and taking |f| instead of f.

Probabilistic statement

Let X be a random variable with finite expected value μ and non-zero variance σ2. Then for any real number k > 0,

    \Pr(|X-\mu|\geq k\sigma) \leq \frac{1}{k^2}.

Only the case k ≥ 1 provides useful information (when k < 1 the right-hand side is greater than one, so the inequality becomes vacuous, as the probability of any event cannot be greater than one). As an example, using k = 2 shows that at least half of the values lie in the interval (μ2σ, μ + 2σ).

Because it can be applied to completely arbitrary distributions (unknown except for mean and variance), the inequality generally gives a poor bound compared to what might be possible if something is known about the distribution involved.

For example, suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within k = 2 SDs of the mean) must be more than 75%, because there is less than 1k2
=  1  4 
chance to be outside that range, by Chebyshev’s inequality. But if we additionally know that the distribution is normal, we can say that is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

As demonstrated in the example above, the theorem will typically provide rather loose bounds. However, the bounds provided by Chebyshev’s inequality cannot, in general (remaining sound for variables of arbitrary distribution), be improved upon. For example, for any k ≥ 1, the following example meets the bounds exactly.

    X = \begin{cases}
        -1, & \text{with probability }\frac{1}{2k^2} \\
         0, & \text{with probability }1 - \frac{1}{k^2} \\
         1, & \text{with probability }\frac{1}{2k^2}

For this distribution mean μ = 0, standard deviation σ =  1  k 

    \Pr(|X-\mu| \ge k\sigma) = \Pr(|X|\ge1) = \frac{1}{k^2}.

Equality holds exactly for any distribution that is a linear transformation of this one. Inequality holds for any distribution that is not a linear transformation of this one.

Variant: One-sided Chebyshev inequality

A one-tailed variant with k > 0, is[2]

\Pr(X-\mu \geq k\sigma)\leq\frac{1}{1+k^2}.

The one-sided version of the Chebyshev inequality is called Cantelli's inequality, and is due to Francesco Paolo Cantelli.

An application: distance between the mean and the median

The one-sided variant can be used to prove the proposition that for probability distributions having an expected value and a median, the mean (i.e., the expected value) and the median can never differ from each other by more than one standard deviation. To express this in mathematical notation, let μ, m, and σ be respectively the mean, the median, and the standard deviation. Then

\left|\mu-m\right| \leq \sigma. \,

(There is no need to rely on an assumption that the variance exists, i.e., is finite. Unlike the situation with the expected value, saying the variance exists is equivalent to saying the variance is finite. But this inequality is trivially true if the variance is infinite.)

Proof using Chebyshev's inequality

Setting k = 1 in the statement for the one-sided inequality gives:

\Pr(X-\mu \geq \sigma)\leq\frac{1}{2}.

By changing the sign of X and so of μ, we get

\Pr(X \leq \mu - \sigma) \leq\frac{1}{2}.

Thus the median is within one standard deviation of the mean.

Proof using Jensen's inequality

This proof uses Jensen's inequality twice. We have

\left|\mu-m\right| = \left|\mathrm{E}(X-m)\right| & {} \leq \mathrm{E}\left(\left|X-m\right|\right) \\
& {} \leq \mathrm{E}\left(\left|X-\mu\right|\right) = \mathrm{E}\left(\sqrt{(X-\mu)^2}\right) \\
& {} \leq \sqrt{\mathrm{E}((X-\mu)^2)} = \sigma.

The first inequality comes from (the convex version of) Jensen's inequality applied to the absolute value function, which is convex. The second comes from the fact that the median minimizes the absolute deviation function

a \mapsto \mathrm{E}(\left|X-a\right|).\,

The third inequality comes from (the concave version of) Jensen's inequality applied to the square root function, which is concave. Q.E.D.

Proof (of the two-sided Chebyshev's inequality)

Measure-theoretic proof

Fix t and let At be defined as At  := {x ∈ X | ƒ(x) ≥ t}, and let 1At be the indicator function of the set At. Then, it is easy to check that, for any x

0\leq g(t) 1_{A_t}\leq g(f(x))\,1_{A_t},

since g is nondecreasing on the range of f, and therefore,

\begin{align}g(t)\mu(A_t)&=\int_X g(t)1_{A_t}\,d\mu\\ &\leq\int_{A_t} g\circ f\,d\mu\\ &\leq\int_X g\circ f\,d\mu.\end{align}

The desired inequality follows from dividing the above inequality by g(t).

Probabilistic proof

Markov's inequality states that for any real-valued random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable Y = (X − μ)2 with a = (σk)2.

It can also be proved directly. For any event A, let IA be the indicator random variable of A, i.e. IA equals 1 if A occurs and 0 otherwise. Then

& {} \qquad \Pr(|X-\mu| \geq k\sigma) = \operatorname{E}(I_{|X-\mu| \geq k\sigma})
= \operatorname{E}(I_{[(X-\mu)/(k\sigma)]^2 \geq 1}) \\[6pt]
& \leq \operatorname{E}\left( \left( {X-\mu \over k\sigma} \right)^2 \right)
= {1 \over k^2} {\operatorname{E}((X-\mu)^2) \over \sigma^2} = {1 \over k^2}.

The direct proof shows why the bounds are quite loose in typical cases: the number 1 to the left of "≥" is replaced by [(X − μ)/(kσ)]2 to the right of "≥" whenever the latter exceeds 1. In some cases it exceeds 1 by a very wide margin.

See also


  1. ^ Donald Knuth, "The Art of Computer Programming", 3rd ed., vol. 1, 1997, p.98
  2. ^ Grimmett and Stirzaker, problem 7.11.9. Several proofs of this result can be found here.

Further reading

  • A. Papoulis (1991), Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-Hill. ISBN 0-07-100870-5. pp. 113–114.
  • G. Grimmett and D. Stirzaker (2001), Probability and Random Processes, 3rd ed. Oxford. ISBN 0-19-857222-0. Section 7.3.

Wikimedia Foundation. 2010.

См. также в других словарях:

  • Chebyshev's inequality — ▪ mathematics also called  Bienaymé Chebyshev inequality        in probability theory, a theorem that characterizes the dispersion of data away from its mean (average). The general theorem is attributed to the 19th century Russian mathematician… …   Universalium

  • Chebyshev's inequality — noun The theorem that in any data sample with finite variance, the probability of any random variable X lying within an arbitrary real k number of standard deviations of the mean is 1 / k, i.e. assuming mean μ and standard deviation σ, the… …   Wiktionary

  • Multidimensional Chebyshev's inequality — In probability theory, the multidimensional Chebyshev s inequality is a generalization of Chebyshev s inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified… …   Wikipedia

  • Chebyshev's theorem — is a name given to several theorems proven by Russian mathematician Pafnuty Chebyshev Bertrand s postulate Chebyshev s inequality Chebyshev s sum inequality Chebyshev s equioscillation theorem The statement that if the function has a limit at… …   Wikipedia

  • Chebyshev, Pafnuty Lvovich — ▪ Russian mathematician born May 4 [May 16, New Style], 1821, Okatovo, Russia died November 26 [December 8], 1894, St. Petersburg  founder of the St. Petersburg mathematical school (sometimes called the Chebyshev school), who is remembered… …   Universalium

  • Chebyshev's sum inequality — For the similarly named inequality in probability theory, see Chebyshev s inequality. In mathematics, Chebyshev s sum inequality, named after Pafnuty Chebyshev, states that if and then Similarly, if …   Wikipedia

  • Inequality — In mathematics, an inequality is a statement about the relative size or order of two objects, or about whether they are the same or not (See also: equality) *The notation a < b means that a is less than b . *The notation a > b means that a is… …   Wikipedia

  • Inequality (mathematics) — Not to be confused with Inequation. Less than and Greater than redirect here. For the use of the < and > signs as punctuation, see Bracket. More than redirects here. For the UK insurance brand, see RSA Insurance Group. The feasible regions… …   Wikipedia

  • Pafnuty Chebyshev — Chebyshev redirects here. For other uses, see Chebyshev (disambiguation). Pafnuty Chebyshev Pafnuty Lvovich Chebyshev Born May 16, 1821 …   Wikipedia

  • Markov's inequality — gives an upper bound for the measure of the set (indicated in red) where f(x) exceeds a given level . The bound combines the level with the average value of f …   Wikipedia

Поделиться ссылкой на выделенное

Прямая ссылка:
Нажмите правой клавишей мыши и выберите «Копировать ссылку»