Conditional Probability, Bayes, and Independence
Conditional probability is the mathematical operation of updating the sample space after learning information. If event has occurred, outcomes outside are no longer possible, and the probability measure is renormalized on . This idea is simple in a finite picture, but it drives many subtle examples: witness reliability, medical testing, Monty Hall, repeated trials, and the difference between pairwise independence and full independence.
Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.
The MIT lectures treat Bayes' formula as the algebra of probability revision and independence as the case where conditioning makes no difference. These ideas are linked: and are independent exactly when , provided . When they are not independent, Bayes' formula gives the disciplined way to update.
Definitions
If , the conditional probability of given is
Equivalently,
The multiplication rule for events is
as long as the conditioning events have positive probability.
Events and are independent if
Events are mutually independent if for every nonempty subset ,
Pairwise independence only checks subsets of size ; it is weaker than mutual independence.
Key results
Bayes' formula follows by writing in two ways:
If partition the sample space and each , then the denominator can be expanded by the law of total probability:
Thus
This is the version used in base-rate problems. The prior matters as much as the likelihood .
Conditional probability itself satisfies the probability axioms on the restricted space : for fixed with , the map is a probability measure.
If and are independent and , then is also independent of :
This identity is often useful when independence is easier to state for the complementary event.
Bayes' formula is best understood as a ratio update. The posterior odds of versus are the prior odds multiplied by the likelihood ratio:
This form makes the base-rate effect explicit. Strong evidence can still lead to a modest posterior probability when the prior probability is small. Conversely, weak evidence can matter a lot when the competing hypotheses had similar prior probabilities.
The multiplication rule is the safest way to compute probabilities of sequential observations. For example, drawing cards without replacement should not be treated as independent trials. The probability of two aces in the first two cards is
not . The second factor is conditional on the first draw having already removed an ace. Independence would apply only if the card were replaced and the deck reshuffled between draws.
Independence is also a property of the probability model, not merely of the words in the story. Two events can sound unrelated but be dependent because of a hidden constraint. In two coin tosses, "first coin is heads" and "total number of heads is odd" are independent. But "first coin is heads" and "both coins have the same result" are also independent, even though the second event mentions the first coin indirectly. The definition, not intuition alone, decides.
For more than two events, mutual independence requires checking all nonempty subcollections. Pairwise independence means every pair factors, but it says nothing about triple intersections. This distinction becomes important in later random-variable settings, where variables may have zero pairwise covariance or pairwise independence while still being collectively constrained.
Visual
| Concept | Formula | Interpretation |
|---|---|---|
| Conditional probability | renormalize on | |
| Multiplication rule | sequential probability | |
| Independence | learning does not change | |
| Bayes' formula | reverse the conditioning | |
| Total probability | average over cases |
Worked example 1: the taxi witness problem
Problem: A town has two taxi companies. On the night of an accident, of taxis are green and are blue. A witness says the taxi was blue. Under similar conditions, the witness identifies taxi color correctly of the time. What is the probability the taxi was actually blue?
Method:
- Let be the event "taxi is blue" and be "witness says blue".
- Priors:
- Likelihoods:
The second value is because a green taxi is mistakenly called blue when the witness is wrong.
- Compute the denominator:
- Apply Bayes:
Checked answer: even a fairly reliable witness does not overcome the base rate entirely. The posterior probability is about , not .
Worked example 2: pairwise independent but not mutually independent
Problem: Toss two fair coins. Let be "first coin is heads", be "second coin is heads", and be "the number of heads is even". Are mutually independent?
Method:
- The sample space is
with each outcome probability .
- The events are
Each has probability .
- Pairwise intersections:
Each has probability , which equals .
-
Thus the events are pairwise independent.
-
Check the triple intersection:
so
But
Checked answer: are pairwise independent but not mutually independent. Knowing any one of them alone gives no information about another one, but knowing two determines the third.
Code
def bayes(prior, sensitivity, false_positive):
p_b = prior
p_not_b = 1 - prior
evidence = sensitivity * p_b + false_positive * p_not_b
return sensitivity * p_b / evidence
print("Taxi posterior:", bayes(0.15, 0.80, 0.20))
outcomes = ["HH", "HT", "TH", "TT"]
events = {
"A": {o for o in outcomes if o[0] == "H"},
"B": {o for o in outcomes if o[1] == "H"},
"C": {o for o in outcomes if o.count("H") % 2 == 0},
}
def prob(event):
return len(event) / len(outcomes)
for x, y in [("A", "B"), ("A", "C"), ("B", "C")]:
print(x, y, prob(events[x] & events[y]), prob(events[x]) * prob(events[y]))
triple = events["A"] & events["B"] & events["C"]
print("triple:", prob(triple), prob(events["A"]) * prob(events["B"]) * prob(events["C"]))
Common pitfalls
- Ignoring the base rate in Bayes problems. is not determined by test accuracy alone.
- Reversing conditional probabilities. In general .
- Saying events are independent because they are disjoint. Nontrivial disjoint events are usually dependent because occurrence of one rules out the other.
- Checking only pairwise independence when a problem asks for mutual independence.
- Conditioning on an event of probability zero. For continuous random variables this requires conditional densities or a limiting interpretation, not the elementary ratio formula.