Skip to main content

Conditional Probability and Bayes' Theorem

Conditional probability is the mathematics of updating. It asks how the probability of an event changes after learning that another event occurred. In statistics, this is the bridge between prior information and evidence; in everyday reasoning, it is where many probability mistakes happen because the direction of conditioning matters.

Tree diagrams organize conditional probabilities for Bayes' theorem.

Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.

The probability chapter in Lane et al. emphasizes conditional probability through cards, disease testing, base rates, and Bayes' theorem. This page develops the same ideas formally and connects them to independence, tree diagrams, and diagnostic reasoning.

Definitions

For events AA and BB with P(B)>0P(B)\gt 0, the conditional probability of AA given BB is

P(AB)=P(AB)P(B).P(A\mid B)=\frac{P(A\cap B)}{P(B)}.

The vertical bar is read as "given." The condition BB becomes the new reference universe. The numerator keeps only the outcomes where both AA and BB occur; the denominator normalizes by the probability of being inside BB.

The multiplication rule follows by rearranging:

P(AB)=P(AB)P(B)=P(BA)P(A).P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A).

Events AA and BB are independent if learning that one occurred does not change the probability of the other:

P(AB)=P(A)P(A\mid B)=P(A)

whenever P(B)>0P(B)\gt 0. Equivalently,

P(AB)=P(A)P(B).P(A\cap B)=P(A)P(B).

A collection A1,,AnA_1,\ldots,A_n is mutually independent if every finite intersection factors into the product of its probabilities. Pairwise independence alone is weaker and does not guarantee mutual independence.

A set of events H1,,HkH_1,\ldots,H_k is a partition of Ω\Omega if the events are disjoint and their union is Ω\Omega. Partitions often represent competing hypotheses.

Key results

Law of total probability. If H1,,HkH_1,\ldots,H_k partition Ω\Omega and P(Hi)>0P(H_i)\gt 0, then

P(A)=i=1kP(AHi)P(Hi).P(A)=\sum_{i=1}^k P(A\mid H_i)P(H_i).

This says that the probability of evidence AA can be computed by splitting the world into cases HiH_i.

Bayes' theorem. For a partition H1,,HkH_1,\ldots,H_k,

P(HjA)=P(AHj)P(Hj)i=1kP(AHi)P(Hi).P(H_j\mid A)=\frac{P(A\mid H_j)P(H_j)}{\sum_{i=1}^k P(A\mid H_i)P(H_i)}.

For two events AA and BB,

P(AB)=P(BA)P(A)P(BA)P(A)+P(BAc)P(Ac).P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B\mid A)P(A)+P(B\mid A^c)P(A^c)}.

The terms have standard interpretations:

TermBayesian nameDiagnostic-test name
P(H)P(H)prior probabilitybase rate
P(EH)P(E\mid H)likelihoodsensitivity if EE is a positive test
P(E)P(E)evidence probabilitypositive-test rate
P(HE)P(H\mid E)posterior probabilitypositive predictive value

Independence and complements. If AA and BB are independent, then AA and BcB^c are independent, AcA^c and BB are independent, and AcA^c and BcB^c are independent. For example,

P(ABc)=P(A)P(AB)=P(A)P(A)P(B)=P(A)(1P(B))=P(A)P(Bc).\begin{aligned} P(A\cap B^c) &=P(A)-P(A\cap B)\\ &=P(A)-P(A)P(B)\\ &=P(A)(1-P(B))\\ &=P(A)P(B^c). \end{aligned}

Conditional independence is different. Events may be independent unconditionally but dependent given a third event, or dependent unconditionally but independent given a third event. This is one reason causal reasoning requires care.

Bayes' theorem can also be written in odds form. If HH is a hypothesis and EE is evidence, then

P(HE)P(HcE)=P(H)P(Hc)P(EH)P(EHc).\frac{P(H\mid E)}{P(H^c\mid E)} =\frac{P(H)}{P(H^c)} \cdot \frac{P(E\mid H)}{P(E\mid H^c)}.

The first factor is the prior odds and the second factor is the likelihood ratio. This form is useful because it shows exactly how evidence changes belief: evidence multiplies prior odds by a factor. A likelihood ratio greater than 11 supports HH over HcH^c; a likelihood ratio less than 11 supports HcH^c over HH.

Conditional probability also depends on the information protocol. In a medical test, a positive result is generated by a known test procedure. In a card game, seeing another player's card depends on the rules of dealing and revealing. In a search problem, the fact that a match was found may depend on how many candidates were searched. These details change the conditioning event. Before applying a formula, state the event after the vertical bar in a way that includes how the information was obtained.

Another reliable habit is to draw a probability tree before writing Bayes' theorem. The first split usually represents the hidden condition or hypothesis, and the second split represents the observed evidence. Multiplying along branches gives joint probabilities such as P(D+)P(D\cap +) and P(Dc+)P(D^c\cap +). Adding the branches that end in the same observation gives the denominator. This tree method is algebraically the same as Bayes' theorem, but it makes the base rate visible and reduces the chance of reversing the conditional probabilities.

Visual

QuantitySymbolIn a medical test
SensitivityP(+D)P(+\mid D)positive if diseased
SpecificityP(Dc)P(-\mid D^c)negative if not diseased
False positive rateP(+Dc)P(+\mid D^c)positive if not diseased
False negative rateP(D)P(-\mid D)negative if diseased
Positive predictive valueP(D+)P(D\mid +)diseased if positive

Worked example 1: two cards without replacement

Problem. Two cards are drawn from a standard 5252-card deck without replacement. Find the probability that both cards are aces. Then find the probability that the second card is an ace given that the first card is an ace.

Method.

  1. Let A1A_1 be "first card is an ace" and A2A_2 be "second card is an ace."

  2. The first draw has 44 aces among 5252 cards:

P(A1)=452=113.P(A_1)=\frac{4}{52}=\frac{1}{13}.
  1. If the first card is an ace, then 33 aces remain among 5151 cards:
P(A2A1)=351=117.P(A_2\mid A_1)=\frac{3}{51}=\frac{1}{17}.
  1. Apply the multiplication rule:
P(A1A2)=P(A2A1)P(A1)=117113=1221.\begin{aligned} P(A_1\cap A_2) &=P(A_2\mid A_1)P(A_1)\\ &=\frac{1}{17}\cdot \frac{1}{13}\\ &=\frac{1}{221}. \end{aligned}
  1. Check against counting. The number of unordered two-card hands is (522)=1326\binom{52}{2}=1326. The number with two aces is (42)=6\binom{4}{2}=6. Thus
(42)(522)=61326=1221.\frac{\binom{4}{2}}{\binom{52}{2}}=\frac{6}{1326}=\frac{1}{221}.

Checked answer. P(two aces)=1/221P(\text{two aces})=1/221, and P(A2A1)=1/17P(A_2\mid A_1)=1/17. The events are not independent because P(A2A1)P(A2)P(A_2\mid A_1)\ne P(A_2).

Worked example 2: base rates and a positive test

Problem. A disease affects 2%2\% of a population. A test has sensitivity 99%99\% and false positive rate 9%9\%. If a person tests positive, what is the probability that the person has the disease?

Method.

  1. Let DD be the event "has disease" and ++ be the event "test positive."

  2. Translate the problem:

P(D)=0.02,P(Dc)=0.98,P(D)=0.02,\quad P(D^c)=0.98, P(+D)=0.99,P(+Dc)=0.09.P(+\mid D)=0.99,\quad P(+\mid D^c)=0.09.
  1. Compute the total probability of a positive test:
P(+)=P(+D)P(D)+P(+Dc)P(Dc)=(0.99)(0.02)+(0.09)(0.98)=0.0198+0.0882=0.108.\begin{aligned} P(+) &=P(+\mid D)P(D)+P(+\mid D^c)P(D^c)\\ &=(0.99)(0.02)+(0.09)(0.98)\\ &=0.0198+0.0882\\ &=0.108. \end{aligned}
  1. Apply Bayes' theorem:
P(D+)=P(+D)P(D)P(+)=0.01980.108=0.1833.\begin{aligned} P(D\mid +) &=\frac{P(+\mid D)P(D)}{P(+)}\\ &=\frac{0.0198}{0.108}\\ &=0.1833\ldots. \end{aligned}
  1. Check with natural frequencies. In 100000100000 people, about 20002000 have the disease and 9800098000 do not. True positives are (0.99)(2000)=1980(0.99)(2000)=1980. False positives are (0.09)(98000)=8820(0.09)(98000)=8820. Among positives, the diseased count is 19801980 out of 1980+8820=108001980+8820=10800, so
198010800=0.1833.\frac{1980}{10800}=0.1833\ldots.

Checked answer. The probability is about 18.3%18.3\%, not 99%99\%. The low base rate creates many false positives.

Code

def bayes_binary(prior, sensitivity, false_positive_rate):
p_pos = sensitivity * prior + false_positive_rate * (1 - prior)
posterior = sensitivity * prior / p_pos
return posterior, p_pos

posterior, positive_rate = bayes_binary(
prior=0.02,
sensitivity=0.99,
false_positive_rate=0.09,
)

print(f"P(positive) = {positive_rate:.4f}")
print(f"P(disease | positive) = {posterior:.4f}")

# Compare with a less rare condition.
for prior in [0.02, 0.10, 0.50]:
post, _ = bayes_binary(prior, 0.99, 0.09)
print(f"prior={prior:.2f}, posterior={post:.3f}")

Common pitfalls

  • Reversing P(AB)P(A\mid B) and P(BA)P(B\mid A). A test can be very likely positive among diseased people while a positive-testing person is not very likely diseased.
  • Ignoring the denominator in Bayes' theorem. The denominator includes all ways the evidence could occur.
  • Treating "independent" as meaning "disjoint." Disjoint nonempty events are usually dependent: if one occurs, the other cannot.
  • Assuming pairwise independence implies mutual independence. Three events can be pairwise independent while the triple intersection does not factor.
  • Conditioning on a collider or selected subgroup without noticing that the conditioning event can create dependence.
  • Saying "the test is 95%95\% accurate" without specifying sensitivity, specificity, and prevalence.

Connections