Skip to main content

Probability Axioms and Inclusion-Exclusion

After counting comes formal probability. A sample space records the possible outcomes of an experiment, events are subsets of that space, and a probability measure assigns numbers to events in a way that behaves like normalized size. The MIT lectures emphasize that the axioms are not merely formal rules: they encode consistency requirements for frequency interpretations, market prices, and personal degrees of belief.

Two overlapping circles show a highlighted Venn diagram region.

Figure: A Venn diagram connects set operations with the same logical connectives used in proofs. Image: Wikimedia Commons, Watchduck, public domain.

Finite equally likely models are the easiest place to see the connection between counting and probability. If every outcome in a finite sample space has the same probability, then P(A)=A/SP(A)=\vert A\vert /\vert S\vert . The axioms then explain how probabilities of complements, unions, and intersections must behave. Inclusion-exclusion is the main tool for turning overlapping event counts into exact probabilities.

Definitions

A sample space SS is the set of possible outcomes. An event is a subset ASA\subseteq S. The complement of AA is Ac=SAA^c=S\setminus A. The union ABA\cup B means "AA or BB"; the intersection ABA\cap B means "AA and BB".

A probability measure PP assigns a number P(A)P(A) to each event AA and satisfies:

  1. Nonnegativity: P(A)0P(A)\ge 0.
  2. Normalization: P(S)=1P(S)=1.
  3. Countable additivity: if A1,A2,A_1,A_2,\ldots are pairwise disjoint, then
P(j=1Aj)=j=1P(Aj).P\left(\bigcup_{j=1}^{\infty}A_j\right)=\sum_{j=1}^{\infty}P(A_j).

In a finite equally likely model,

P(A)=AS.P(A)=\frac{|A|}{|S|}.

De Morgan's laws describe complements of unions and intersections:

(AB)c=AcBc,(AB)c=AcBc.(A\cup B)^c=A^c\cap B^c, \qquad (A\cap B)^c=A^c\cup B^c.

The laws extend to any finite or countable collection of events.

Key results

Several basic facts follow from the axioms:

P(Ac)=1P(A),P(A^c)=1-P(A),

because AA and AcA^c are disjoint and AAc=SA\cup A^c=S.

If ABA\subseteq B, then

P(A)P(B),P(A)\le P(B),

because B=A(BA)B=A\cup(B\setminus A) is a disjoint union.

For two events,

P(AB)=P(A)+P(B)P(AB).P(A\cup B)=P(A)+P(B)-P(A\cap B).

The subtraction corrects for the overlap, which was counted twice.

The general inclusion-exclusion formula is

P(i=1nEi)=iP(Ei)i<jP(EiEj)+i<j<kP(EiEjEk)+(1)n+1P(E1En).\begin{aligned} P\left(\bigcup_{i=1}^n E_i\right) &= \sum_i P(E_i) - \sum_{i<j}P(E_i\cap E_j) +\sum_{i<j<k}P(E_i\cap E_j\cap E_k) -\cdots \\ &\quad +(-1)^{n+1}P(E_1\cap\cdots\cap E_n). \end{aligned}

Proof sketch: fix an outcome ω\omega. Suppose ω\omega lies in exactly mm of the events. Its total coefficient on the right side is

(m1)(m2)+(m3)+(1)m+1(mm)=1\binom{m}{1}-\binom{m}{2}+\binom{m}{3}-\cdots+(-1)^{m+1}\binom{m}{m}=1

when m>0m\gt 0, by the binomial theorem applied to 1(11)m1-(1-1)^m. If m=0m=0, its coefficient is 00. Thus the right side counts exactly the union.

The axioms also provide consistency checks for proposed answers. If a proposed probability of ABA\cap B is larger than P(A)P(A), it cannot be correct. If a proposed value of P(AB)P(A\cup B) is smaller than either P(A)P(A) or P(B)P(B), it violates monotonicity. These checks are simple, but they catch many errors in word problems where several overlapping conditions are being counted at once.

In finite equally likely models, probability and counting are interchangeable only after the sample space is chosen. Rolling two dice should usually be modeled by the 3636 ordered pairs (i,j)(i,j), not by the 1111 possible sums, because the sums are not equally likely. A sum of 77 has six ordered pairs, while a sum of 22 has only one. The axioms allow nonuniform probabilities, but the shortcut P(A)=A/SP(A)=\vert A\vert /\vert S\vert does not.

Inclusion-exclusion is most useful when intersections are easier than unions. The hat problem is the lecture's central example: "person ii gets their own hat" is easy to intersect because fixing several people leaves a smaller permutation problem. Directly counting "nobody gets their own hat" is less transparent until inclusion-exclusion converts it into those fixed-point intersections.

Complements are often the simplest special case of the same philosophy. Events such as "at least one match", "at least one repeated birthday", or "at least one six" may overlap in many ways. Their complements, such as "no matches", "all birthdays distinct", or "no sixes", often have a clean sequential count. The answer is then 11 minus the complement probability.

De Morgan's laws help prevent logical errors when taking complements. The complement of "rain or snow" is "not rain and not snow", not "not rain or not snow". In symbols, complements turn unions into intersections and intersections into unions. This is why probability computations involving "none", "all", "at least one", and "not both" should usually be translated into set notation before manipulating formulas.

Visual

IdentityMeaningCommon use
P(Ac)=1P(A)P(A^c)=1-P(A)complement rule"at least one" via "none"
P(AB)=P(A)+P(B)P(AB)P(A\cup B)=P(A)+P(B)-P(A\cap B)two-event inclusion-exclusionoverlapping conditions
P(A)P(B)P(A)\le P(B) if ABA\subseteq Bmonotonicitychecking impossible answers
P()=0P(\emptyset)=0null eventeliminating disjoint leftovers
P(A)=A/SP(A)=\vert A\vert /\vert S\vert finite uniform lawdice, cards, shuffled hats

The flowchart emphasizes a practical habit: do not begin with inclusion-exclusion just because several events are mentioned. First ask whether the events are disjoint, whether a complement is simpler, and whether a finite uniform model has actually been justified. Inclusion-exclusion is exact but can be algebraically heavy. For many "at least one" problems, the complement is a one-line count. For problems with many overlapping events, such as derangements, inclusion-exclusion is the natural systematic method.

Worked example 1: derangements in the hat problem

Problem: nn people put their hats in a bin, hats are randomly shuffled, and each person receives one hat. What is the probability that nobody receives their own hat?

Method:

  1. Let EiE_i be the event that person ii gets their own hat.
  2. We want
P(E1cEnc)=1P(E1En).P(E_1^c\cap\cdots\cap E_n^c) = 1-P(E_1\cup\cdots\cup E_n).
  1. For any fixed set of rr people, the probability that all rr get their own hats is
(nr)!n!.\frac{(n-r)!}{n!}.

There are (nr)!(n-r)! permutations fixing those rr hats, out of n!n! total permutations.

  1. Inclusion-exclusion gives
P(E1En)=r=1n(1)r+1(nr)(nr)!n!=r=1n(1)r+11r!.\begin{aligned} P(E_1\cup\cdots\cup E_n) &= \sum_{r=1}^{n}(-1)^{r+1}\binom{n}{r}\frac{(n-r)!}{n!} \\ &= \sum_{r=1}^{n}(-1)^{r+1}\frac{1}{r!}. \end{aligned}
  1. Therefore the probability of no fixed points is
1r=1n(1)r+11r!=r=0n(1)r1r!.1-\sum_{r=1}^{n}(-1)^{r+1}\frac{1}{r!} = \sum_{r=0}^{n}(-1)^r\frac{1}{r!}.

Checked answer: as nn grows, this approaches e1e^{-1}. For n=4n=4,

11+1216+124=924=38.1-1+\frac{1}{2}-\frac{1}{6}+\frac{1}{24}=\frac{9}{24}=\frac{3}{8}.

Indeed, there are 99 derangements of 44 objects out of 2424 permutations.

Worked example 2: birthday collision by the complement

Problem: In a room of 2323 people, assuming birthdays are independent and uniformly distributed among 365365 days, what is the probability that at least two people share a birthday?

Method:

  1. Directly counting all possible collisions is awkward because many collision patterns overlap.
  2. Use the complement: no two people share a birthday.
  3. The total number of birthday assignments is 36523365^{23}.
  4. The number with all birthdays distinct is
365364363(36522).365\cdot 364\cdot 363\cdots(365-22).
  1. Thus
P(no shared birthday)=36536434336523.P(\text{no shared birthday}) = \frac{365\cdot 364\cdots 343}{365^{23}}.
  1. Therefore
P(at least one shared birthday)=136536434336523.P(\text{at least one shared birthday}) = 1-\frac{365\cdot 364\cdots 343}{365^{23}}.

Checked numerical value:

1j=022(1j365)0.5073.1-\prod_{j=0}^{22}\left(1-\frac{j}{365}\right)\approx 0.5073.

The answer is slightly above one half, which is the well-known birthday surprise.

Code

from math import factorial, prod, e

def derangement_probability(n):
return sum(((-1) ** r) / factorial(r) for r in range(n + 1))

def birthday_collision_probability(people, days=365):
no_collision = prod((days - j) / days for j in range(people))
return 1 - no_collision

for n in [4, 6, 10]:
print(n, derangement_probability(n), "limit about", 1 / e)

print("23-person birthday collision:", birthday_collision_probability(23))

Common pitfalls

  • Treating P(AB)P(A\cup B) as P(A)+P(B)P(A)+P(B) when AA and BB overlap. Additivity only applies to disjoint events.
  • Forgetting to define the sample space before declaring outcomes equally likely. Some descriptions hide unequal likelihoods.
  • Counting a complement correctly and then forgetting to subtract from 11.
  • Confusing "at least one" with "exactly one". Complements often help with "at least one", but exact counts usually require a separate condition.
  • Assuming that a probability larger than a more detailed event is impossible. The axioms imply P(AB)P(A)P(A\cap B)\le P(A), so a conjunction cannot be more likely than one of its parts.

Connections