Conditional probability

Conditional probability

The actual probability of an event A may in many circumstances differ from its original probability, because new information is available, in particular the information that an other event B has occurred. Intuition prescribes that the still possible outcomes, then are restricted to this event B. Hence B then plays the role of the new sample space, and the event A now only occurs if AB occurs. The new probability of A, called the conditional probability of A given B has to be calculated as the quotient of the probability of AB and the probability of B. [1] This conditional probability is commonly notated as P(A | B). (The two events are separated by a vertical line; this should not be mistaken as "the probability of some event A | B", i.e. the event A OR B.)



Illustration of conditional probability. S is the sample space, A and Bn are events. Assuming probability is proportional to area, the unconditional probability P(A) ≈ 0.33. However, the conditional probability P(A | B1) = 1, P(A | B2) ≈ 0.85 and P(A | B3) = 0.

Conditioning on an event

Given two events A and B in the same probability space with P(B) > 0, the conditional probability of A given B is defined as the quotient of the joint probability of A and B, and the unconditional probability of B:

P(A|B) = \frac{P(A \cap B)}{P(B)}

This may be interpreted using a Venn diagram. For example, in the diagram, if the probability distribution is uniform on S, P(A | B2) is the ratio of the probabilities represented by the areas AB2 and B2.

Note: Although the same symbol P is used for both the (original) probability and the derived conditional probability, they are different functions.

Definition with σ-algebra

If P(B) = 0, then the simple definition of P(A | B) is undefined. However, it is possible to define a conditional probability with respect to a σ-algebra of such events (such as those arising from a continuous random variable).

For example, if X and Y are non-degenerate and jointly continuous random variables with density ƒX,Y(xy) then, if B has positive measure,

P(X \in A \mid Y \in B) =
\frac{\int_{y\in B}\int_{x\in A} f_{X,Y}(x,y)\,dx\,dy}{\int_{y\in B}\int_{x\in\Omega} f_{X,Y}(x,y)\,dx\,dy} .

The case where B has zero measure can only be dealt with directly in the case that B={y0}, representing a single point, in which case

P(X \in A \mid Y = y_0) = \frac{\int_{x\in A} f_{X,Y}(x,y_0)\,dx}{\int_{x\in\Omega} f_{X,Y}(x,y_0)\,dx} .

If A has measure zero then the conditional probability is zero. An indication of why the more general case of zero measure cannot be dealt with in a similar way can be seen by noting that the limit, as all δyi approach zero, of

P(X \in A \mid Y \in \cup_i[y_i,y_i+\delta y_i]) \approxeq
\frac{\sum_{i} \int_{x\in A} f_{X,Y}(x,y_i)\,dx\,\delta y_i}{\sum_{i}\int_{x\in\Omega} f_{X,Y}(x,y_i) \,dx\, \delta y_i} ,

depends on their relationship as they approach zero. See conditional expectation for more information.

Conditioning on a random variable

Conditioning on an event may be generalized to conditioning on a random variable. Let X be a random variable taking some value from {xn}. Let A be an event. The probability of A given X is defined as

P(A|X) = \begin{cases}
P(A\mid X=x_0) & \text{if }X=x_0 \\
P(A\mid X=x_1) & \text{if }X=x_1 \\

Note that P(A | X) and X are now both random variables. From the law of total probability, the expected value of P(A | X) is equal to the unconditional probability of A.


Consider the rolling of two fair six-sided dice.

  • Let A be the value rolled on die 1
  • Let B be the value rolled on die 2
  • Let An be the event that A = n
  • Let Σm be the event that A+B \leq m

Initially, suppose we are to roll A and B many times. In what proportion of these rolls would A = 2? Table 1 shows the sample space - all 36 possible combinations. A = 2 in 6 of these. The answer is therefore \textstyle \frac{6}{36} = \frac{1}{6}. In more compact notation, \textstyle P(A_2) = \frac{1}{6}.

Table 1
+ B=1 2 3 4 5 6
A=1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Suppose however we roll the dice many times, but ignore cases in which A + B > 5. In what proportion of the remaining rolls would A = 2? Table 2 shows that A+B \leq 5 in 10 of the combinations. A = 2 in 3 of these. The answer is therefore \textstyle \frac{3}{10} = 0.3. We say, the probability that A = 2 given that A+B \leq 5, is 0.3. This is a conditional probability, because it has a condition that limits the sample space. In more compact notation, P(A2 | Σ5) = 0.3.

Table 2
+ B=1 2 3 4 5 6
A=1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Statistical independence

If two events A and B are statistically independent, the occurrence of A does not affect the probability of B, and vice versa. That is,

P(A|B) \ = \ P(A)
P(B|A) \ = \ P(B).

Using the definition of conditional probability, it follows from either formula that

P(A \cap B) \ = \ P(A) P(B)

This is the definition of statistical independence. This form is the preferred definition, as it is symmetrical in A and B, and no values are undefined if P(A) or P(B) is 0.

Common fallacies

Assuming conditional probability is of similar size to its inverse

In general, it cannot be assumed that P(A|B) \approx P(B|A). This can be an insidious error, even for those who are highly conversant with statistics.[2] The relationship between P(A | B) and P(B | A) is given by Bayes' theorem:

P(B|A) = P(A|B) \frac{P(B)}{P(A)}.

That is, P(A|B) \approx P(B|A) only if \textstyle \frac{P(B)}{P(A)}\approx 1, or equivalently, \textstyle P(A)\approx P(B).

Assuming marginal and conditional probabilities are of similar size

In general, it cannot be assumed that P(A) \approx P(A|B). These probabilities are linked through the formula for total probability:

P(A) \, = \, \sum_n P(A \cap B_n) \, = \, \sum_n P(A|B_n)P(B_n).

This fallacy may arise through selection bias.[3] For example, in the context of a medical claim, let SC be the event that sequelae S occurs as a consequence of circumstance C. Let H be the event that an individual seeks medical help. Suppose that in most cases, C does not cause S so P(SC) is low. Suppose also that medical attention is only sought if S has occurred. From experience of patients, a doctor may therefore erroneously conclude that P(SC) is high. The actual probability observed by the doctor is P(SC | H).

Formal Derivation

This section is based on the derivation given in Grinsted and Snell's Introduction to Probability[4].

Let Ω be a sample space with elementary events {ω}. Suppose we are told the event E \subseteq \Omega has occurred. A new probability distribution (denoted by the conditional notation) is to be assigned on {ω} to reflect this. For events in E, It is reasonable to assume that the relative magnitudes of the probabilities will be preserved. For some constant scale factor α, the new distribution will therefore satisfy:

\text{1. }\omega \in E : P(\omega| E) = \alpha P(\omega)
\text{2. }\omega \notin E : P(\omega| E) = 0
\text{3. }\sum_{\omega \in \Omega} {P(\omega|E)} = 1.

Substituting 1 and 2 into 3 to select α:

\sum_{\omega \in \Omega} {P(\omega | E)} &= \sum_{\omega \in E} {\alpha P(\omega)} + \cancelto{0}{\sum_{\omega \notin E} 0} \\
&= \alpha \sum_{\omega \in E} {P(\omega)} \\
&= \alpha \cdot P(E) \\
\implies \alpha = \frac{1}{P(E)}

So the new probability distribution is

\text{1. }\omega \in E : P(\omega|E) = \frac{P(\omega)}{P(E)}
\text{2. }\omega \notin E : P(\omega| E) = 0

Now for a general event F,

P(F|E) &= \sum_{\omega \in F \cap E} {P(\omega | E)} + \cancelto{0}{\sum_{\omega \in F \cap E^c} P(\omega|E)} \\
&= \sum_{\omega \in F \cap E} {\frac{P(\omega)}{P(E)}} \\
&= \frac{P(F \cap E)}{P(E)}

See also


  1. ^ George Casella and Roger L. Berger,(1990) Statistical Inference, Duxbury Press, ISBN 0534119581 (p. 18 et seq.)
  2. ^ Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. ISBN 0809074478 (p. 63 et seq.)
  3. ^ Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007
  4. ^ Grinstead and Snell's Introduction to Probability, p. 134
  • F. Thomas Bruss Der Wyatt-Earp-Effekt oder die betörende Macht kleiner Wahrscheinlichkeiten (in German), Spektrum der Wissenschaft (German Edition of Scientific American), Vol 2, 110–113, (2007).

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Conditional Probability — Probability of an event or outcome based on the occurrence of a previous event or outcome. Conditional probability is calculated by multiplying the probability of the preceding event by the updated probability of the succeeding event. Conditional …   Investment dictionary

  • conditional probability — Term for the probability of one event given, or conditional upon another. The process of conditionalizing is one of taking the probability of an event e occurring after another event e has occurred, to have the value that the conditional… …   Philosophy dictionary

  • conditional probability — Statistics, Math. the probability that an event will occur under the condition that another event occurs first: equal to the probability that both will occur divided by the probability that the first will occur. [1960 65] * * * …   Universalium

  • conditional probability — noun the probability that an event will occur given that one or more other events have occurred • Syn: ↑contingent probability • Hypernyms: ↑probability, ↑chance …   Useful english dictionary

  • conditional probability — noun Date: 1937 the probability that a given event will occur if it is certain that another event has taken place or will take place …   New Collegiate Dictionary

  • conditional probability — /kənˌdɪʃənəl prɒbəˈbɪləti/ (say kuhn.dishuhnuhl probuh biluhtee) noun the statistical probability of the occurrence of an event under the condition that only a portion of the cases or alternatives are to be considered …   Australian English dictionary

  • conditional probability — likelihood of the occurrence of an event taking into account the occurrence of another event …   English contemporary dictionary

  • Conditional probability distribution — Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value. If the conditional distribution of Y given X is a… …   Wikipedia

  • Regular conditional probability — is a concept that has developed to overcome certain difficulties in formally defining conditional probabilities for continuous probability distributions. It is defined as an alternative probability measure conditioned on a particular value of a… …   Wikipedia

  • Probability — is the likelihood or chance that something is the case or will happen. Probability theory is used extensively in areas such as statistics, mathematics, science and philosophy to draw conclusions about the likelihood of potential events and the… …   Wikipedia