Probability Axioms and Inclusion-Exclusion
After counting comes formal probability. A sample space records the possible outcomes of an experiment, events are subsets of that space, and a probability measure assigns numbers to events in a way that behaves like normalized size. The MIT lectures emphasize that the axioms are not merely formal rules: they encode consistency requirements for frequency interpretations, market prices, and personal degrees of belief.
Figure: A Venn diagram connects set operations with the same logical connectives used in proofs. Image: Wikimedia Commons, Watchduck, public domain.
Finite equally likely models are the easiest place to see the connection between counting and probability. If every outcome in a finite sample space has the same probability, then . The axioms then explain how probabilities of complements, unions, and intersections must behave. Inclusion-exclusion is the main tool for turning overlapping event counts into exact probabilities.
Definitions
A sample space is the set of possible outcomes. An event is a subset . The complement of is . The union means " or "; the intersection means " and ".
A probability measure assigns a number to each event and satisfies:
- Nonnegativity: .
- Normalization: .
- Countable additivity: if are pairwise disjoint, then
In a finite equally likely model,
De Morgan's laws describe complements of unions and intersections:
The laws extend to any finite or countable collection of events.
Key results
Several basic facts follow from the axioms:
because and are disjoint and .
If , then
because is a disjoint union.
For two events,
The subtraction corrects for the overlap, which was counted twice.
The general inclusion-exclusion formula is
Proof sketch: fix an outcome . Suppose lies in exactly of the events. Its total coefficient on the right side is
when , by the binomial theorem applied to . If , its coefficient is . Thus the right side counts exactly the union.
The axioms also provide consistency checks for proposed answers. If a proposed probability of is larger than , it cannot be correct. If a proposed value of is smaller than either or , it violates monotonicity. These checks are simple, but they catch many errors in word problems where several overlapping conditions are being counted at once.
In finite equally likely models, probability and counting are interchangeable only after the sample space is chosen. Rolling two dice should usually be modeled by the ordered pairs , not by the possible sums, because the sums are not equally likely. A sum of has six ordered pairs, while a sum of has only one. The axioms allow nonuniform probabilities, but the shortcut does not.
Inclusion-exclusion is most useful when intersections are easier than unions. The hat problem is the lecture's central example: "person gets their own hat" is easy to intersect because fixing several people leaves a smaller permutation problem. Directly counting "nobody gets their own hat" is less transparent until inclusion-exclusion converts it into those fixed-point intersections.
Complements are often the simplest special case of the same philosophy. Events such as "at least one match", "at least one repeated birthday", or "at least one six" may overlap in many ways. Their complements, such as "no matches", "all birthdays distinct", or "no sixes", often have a clean sequential count. The answer is then minus the complement probability.
De Morgan's laws help prevent logical errors when taking complements. The complement of "rain or snow" is "not rain and not snow", not "not rain or not snow". In symbols, complements turn unions into intersections and intersections into unions. This is why probability computations involving "none", "all", "at least one", and "not both" should usually be translated into set notation before manipulating formulas.
Visual
| Identity | Meaning | Common use |
|---|---|---|
| complement rule | "at least one" via "none" | |
| two-event inclusion-exclusion | overlapping conditions | |
| if | monotonicity | checking impossible answers |
| null event | eliminating disjoint leftovers | |
| finite uniform law | dice, cards, shuffled hats |
The flowchart emphasizes a practical habit: do not begin with inclusion-exclusion just because several events are mentioned. First ask whether the events are disjoint, whether a complement is simpler, and whether a finite uniform model has actually been justified. Inclusion-exclusion is exact but can be algebraically heavy. For many "at least one" problems, the complement is a one-line count. For problems with many overlapping events, such as derangements, inclusion-exclusion is the natural systematic method.
Worked example 1: derangements in the hat problem
Problem: people put their hats in a bin, hats are randomly shuffled, and each person receives one hat. What is the probability that nobody receives their own hat?
Method:
- Let be the event that person gets their own hat.
- We want
- For any fixed set of people, the probability that all get their own hats is
There are permutations fixing those hats, out of total permutations.
- Inclusion-exclusion gives
- Therefore the probability of no fixed points is
Checked answer: as grows, this approaches . For ,
Indeed, there are derangements of objects out of permutations.
Worked example 2: birthday collision by the complement
Problem: In a room of people, assuming birthdays are independent and uniformly distributed among days, what is the probability that at least two people share a birthday?
Method:
- Directly counting all possible collisions is awkward because many collision patterns overlap.
- Use the complement: no two people share a birthday.
- The total number of birthday assignments is .
- The number with all birthdays distinct is
- Thus
- Therefore
Checked numerical value:
The answer is slightly above one half, which is the well-known birthday surprise.
Code
from math import factorial, prod, e
def derangement_probability(n):
return sum(((-1) ** r) / factorial(r) for r in range(n + 1))
def birthday_collision_probability(people, days=365):
no_collision = prod((days - j) / days for j in range(people))
return 1 - no_collision
for n in [4, 6, 10]:
print(n, derangement_probability(n), "limit about", 1 / e)
print("23-person birthday collision:", birthday_collision_probability(23))
Common pitfalls
- Treating as when and overlap. Additivity only applies to disjoint events.
- Forgetting to define the sample space before declaring outcomes equally likely. Some descriptions hide unequal likelihoods.
- Counting a complement correctly and then forgetting to subtract from .
- Confusing "at least one" with "exactly one". Complements often help with "at least one", but exact counts usually require a separate condition.
- Assuming that a probability larger than a more detailed event is impossible. The axioms imply , so a conjunction cannot be more likely than one of its parts.