Skip to main content

Conditional Probability, Bayes, and Independence

Conditional probability is the mathematical operation of updating the sample space after learning information. If event BB has occurred, outcomes outside BB are no longer possible, and the probability measure is renormalized on BB. This idea is simple in a finite picture, but it drives many subtle examples: witness reliability, medical testing, Monty Hall, repeated trials, and the difference between pairwise independence and full independence.

Tree diagrams organize conditional probabilities for Bayes' theorem.

Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.

The MIT lectures treat Bayes' formula as the algebra of probability revision and independence as the case where conditioning makes no difference. These ideas are linked: AA and BB are independent exactly when P(AB)=P(A)P(A\mid B)=P(A), provided P(B)>0P(B)\gt 0. When they are not independent, Bayes' formula gives the disciplined way to update.

Definitions

If P(B)>0P(B)\gt 0, the conditional probability of AA given BB is

P(AB)=P(AB)P(B).P(A\mid B)=\frac{P(A\cap B)}{P(B)}.

Equivalently,

P(AB)=P(AB)P(B)=P(BA)P(A).P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A).

The multiplication rule for events E1,,EnE_1,\ldots,E_n is

P(E1En)=P(E1)P(E2E1)P(E3E1E2)P(EnE1En1),P(E_1\cap\cdots\cap E_n) = P(E_1)P(E_2\mid E_1)P(E_3\mid E_1\cap E_2)\cdots P(E_n\mid E_1\cap\cdots\cap E_{n-1}),

as long as the conditioning events have positive probability.

Events AA and BB are independent if

P(AB)=P(A)P(B).P(A\cap B)=P(A)P(B).

Events E1,,EnE_1,\ldots,E_n are mutually independent if for every nonempty subset I{1,,n}I\subseteq\{1,\ldots,n\},

P(iIEi)=iIP(Ei).P\left(\bigcap_{i\in I}E_i\right)=\prod_{i\in I}P(E_i).

Pairwise independence only checks subsets of size 22; it is weaker than mutual independence.

Key results

Bayes' formula follows by writing P(AB)P(A\cap B) in two ways:

P(AB)=P(BA)P(A)P(B).P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}.

If A1,,AnA_1,\ldots,A_n partition the sample space and each P(Ai)>0P(A_i)\gt 0, then the denominator can be expanded by the law of total probability:

P(B)=i=1nP(BAi)P(Ai).P(B)=\sum_{i=1}^{n}P(B\mid A_i)P(A_i).

Thus

P(AjB)=P(BAj)P(Aj)i=1nP(BAi)P(Ai).P(A_j\mid B)= \frac{P(B\mid A_j)P(A_j)} {\sum_{i=1}^{n}P(B\mid A_i)P(A_i)}.

This is the version used in base-rate problems. The prior P(Aj)P(A_j) matters as much as the likelihood P(BAj)P(B\mid A_j).

Conditional probability itself satisfies the probability axioms on the restricted space BB: for fixed BB with P(B)>0P(B)\gt 0, the map AP(AB)A\mapsto P(A\mid B) is a probability measure.

If AA and BB are independent and 0<P(B)<10\lt P(B)\lt 1, then AA is also independent of BcB^c:

P(ABc)=P(A)P(AB)=P(A)P(A)P(B)=P(A)P(Bc).P(A\cap B^c)=P(A)-P(A\cap B)=P(A)-P(A)P(B)=P(A)P(B^c).

This identity is often useful when independence is easier to state for the complementary event.

Bayes' formula is best understood as a ratio update. The posterior odds of AA versus AcA^c are the prior odds multiplied by the likelihood ratio:

P(AB)P(AcB)=P(A)P(Ac)P(BA)P(BAc).\frac{P(A\mid B)}{P(A^c\mid B)} = \frac{P(A)}{P(A^c)} \cdot \frac{P(B\mid A)}{P(B\mid A^c)}.

This form makes the base-rate effect explicit. Strong evidence can still lead to a modest posterior probability when the prior probability is small. Conversely, weak evidence can matter a lot when the competing hypotheses had similar prior probabilities.

The multiplication rule is the safest way to compute probabilities of sequential observations. For example, drawing cards without replacement should not be treated as independent trials. The probability of two aces in the first two cards is

452351,\frac{4}{52}\cdot\frac{3}{51},

not (4/52)2(4/52)^2. The second factor is conditional on the first draw having already removed an ace. Independence would apply only if the card were replaced and the deck reshuffled between draws.

Independence is also a property of the probability model, not merely of the words in the story. Two events can sound unrelated but be dependent because of a hidden constraint. In two coin tosses, "first coin is heads" and "total number of heads is odd" are independent. But "first coin is heads" and "both coins have the same result" are also independent, even though the second event mentions the first coin indirectly. The definition, not intuition alone, decides.

For more than two events, mutual independence requires checking all nonempty subcollections. Pairwise independence means every pair factors, but it says nothing about triple intersections. This distinction becomes important in later random-variable settings, where variables may have zero pairwise covariance or pairwise independence while still being collectively constrained.

Visual

ConceptFormulaInterpretation
Conditional probabilityP(AB)=P(AB)/P(B)P(A\mid B)=P(A\cap B)/P(B)renormalize on BB
Multiplication ruleP(AB)=P(A)P(BA)P(A\cap B)=P(A)P(B\mid A)sequential probability
IndependenceP(AB)=P(A)P(B)P(A\cap B)=P(A)P(B)learning BB does not change AA
Bayes' formulaP(AB)=P(BA)P(A)/P(B)P(A\mid B)=P(B\mid A)P(A)/P(B)reverse the conditioning
Total probabilityP(B)=iP(BAi)P(Ai)P(B)=\sum_iP(B\mid A_i)P(A_i)average over cases

Worked example 1: the taxi witness problem

Problem: A town has two taxi companies. On the night of an accident, 85%85\% of taxis are green and 15%15\% are blue. A witness says the taxi was blue. Under similar conditions, the witness identifies taxi color correctly 80%80\% of the time. What is the probability the taxi was actually blue?

Method:

  1. Let BB be the event "taxi is blue" and WW be "witness says blue".
  2. Priors:
P(B)=0.15,P(Bc)=0.85.P(B)=0.15,\qquad P(B^c)=0.85.
  1. Likelihoods:
P(WB)=0.80,P(WBc)=0.20.P(W\mid B)=0.80,\qquad P(W\mid B^c)=0.20.

The second value is 0.200.20 because a green taxi is mistakenly called blue when the witness is wrong.

  1. Compute the denominator:
P(W)=P(WB)P(B)+P(WBc)P(Bc)=0.80(0.15)+0.20(0.85)=0.12+0.17=0.29.\begin{aligned} P(W) &=P(W\mid B)P(B)+P(W\mid B^c)P(B^c)\\ &=0.80(0.15)+0.20(0.85)\\ &=0.12+0.17\\ &=0.29. \end{aligned}
  1. Apply Bayes:
P(BW)=0.80(0.15)0.29=0.120.290.4138.P(B\mid W)=\frac{0.80(0.15)}{0.29} =\frac{0.12}{0.29} \approx 0.4138.

Checked answer: even a fairly reliable witness does not overcome the base rate entirely. The posterior probability is about 41.4%41.4\%, not 80%80\%.

Worked example 2: pairwise independent but not mutually independent

Problem: Toss two fair coins. Let AA be "first coin is heads", BB be "second coin is heads", and CC be "the number of heads is even". Are A,B,CA,B,C mutually independent?

Method:

  1. The sample space is
{HH,HT,TH,TT},\{HH,HT,TH,TT\},

with each outcome probability 1/41/4.

  1. The events are
A={HH,HT},B={HH,TH},C={HH,TT}.A=\{HH,HT\},\quad B=\{HH,TH\},\quad C=\{HH,TT\}.

Each has probability 1/21/2.

  1. Pairwise intersections:
AB={HH},AC={HH},BC={HH}.A\cap B=\{HH\},\quad A\cap C=\{HH\},\quad B\cap C=\{HH\}.

Each has probability 1/41/4, which equals (1/2)(1/2)(1/2)(1/2).

  1. Thus the events are pairwise independent.

  2. Check the triple intersection:

ABC={HH},A\cap B\cap C=\{HH\},

so

P(ABC)=14.P(A\cap B\cap C)=\frac14.

But

P(A)P(B)P(C)=121212=18.P(A)P(B)P(C)=\frac12\cdot\frac12\cdot\frac12=\frac18.

Checked answer: A,B,CA,B,C are pairwise independent but not mutually independent. Knowing any one of them alone gives no information about another one, but knowing two determines the third.

Code

def bayes(prior, sensitivity, false_positive):
p_b = prior
p_not_b = 1 - prior
evidence = sensitivity * p_b + false_positive * p_not_b
return sensitivity * p_b / evidence

print("Taxi posterior:", bayes(0.15, 0.80, 0.20))

outcomes = ["HH", "HT", "TH", "TT"]
events = {
"A": {o for o in outcomes if o[0] == "H"},
"B": {o for o in outcomes if o[1] == "H"},
"C": {o for o in outcomes if o.count("H") % 2 == 0},
}

def prob(event):
return len(event) / len(outcomes)

for x, y in [("A", "B"), ("A", "C"), ("B", "C")]:
print(x, y, prob(events[x] & events[y]), prob(events[x]) * prob(events[y]))

triple = events["A"] & events["B"] & events["C"]
print("triple:", prob(triple), prob(events["A"]) * prob(events["B"]) * prob(events["C"]))

Common pitfalls

  • Ignoring the base rate in Bayes problems. P(BA)P(B\mid A) is not determined by test accuracy alone.
  • Reversing conditional probabilities. In general P(AB)P(BA)P(A\mid B)\ne P(B\mid A).
  • Saying events are independent because they are disjoint. Nontrivial disjoint events are usually dependent because occurrence of one rules out the other.
  • Checking only pairwise independence when a problem asks for mutual independence.
  • Conditioning on an event of probability zero. For continuous random variables this requires conditional densities or a limiting interpretation, not the elementary ratio formula.

Connections