Skip to main content

Sample Spaces, Events, and Axioms

Probability begins by separating two ideas that everyday language often mixes together: the outcomes that could happen and the numerical rules used to describe how likely those outcomes are. A sample space lists the possible outcomes of an experiment, while events are the subsets of outcomes we ask questions about. Once those objects are clear, probability is not a collection of gambling tricks; it is a measure on events.

Two overlapping circles show a highlighted Venn diagram region.

Figure: A Venn diagram connects set operations with the same logical connectives used in proofs. Image: Wikimedia Commons, Watchduck, public domain.

This page gives the formal starting point for the rest of probability theory. Lane et al.'s statistics text introduces probability through equally likely outcomes, relative frequencies, simple compound events, and base rates. A probability-theory course keeps those examples but places them inside Kolmogorov's axioms, which work equally well for finite dice rolls, countably infinite waiting times, and continuous measurements such as lifetimes.

Definitions

A random experiment is a process whose outcome is not known in advance but whose possible outcomes can be specified. The sample space is the set of all possible outcomes and is usually denoted by Ω\Omega or SS.

An event is a subset of the sample space. If the observed outcome ω\omega lies in event AA, we say that event AA occurred. Common event operations are:

  • Complement: Ac={ωΩ:ωA}A^c = \{\omega \in \Omega : \omega \notin A\}.
  • Union: AB={ω:ωA or ωB}A \cup B = \{\omega : \omega \in A \text{ or } \omega \in B\}.
  • Intersection: AB={ω:ωA and ωB}A \cap B = \{\omega : \omega \in A \text{ and } \omega \in B\}.
  • Difference: AB=ABcA \setminus B = A \cap B^c.
  • Disjoint events: AA and BB are disjoint if AB=A \cap B = \varnothing.

A probability space is a triple (Ω,F,P)(\Omega, \mathcal{F}, P) where:

  1. Ω\Omega is the sample space.
  2. F\mathcal{F} is a collection of events, called a sigma-algebra, closed under complements and countable unions.
  3. PP assigns a number P(A)P(A) to each event AFA \in \mathcal{F}.

In a finite or countable sample space, it is common to assign probabilities to individual outcomes and then add. If Ω={ω1,ω2,}\Omega = \{\omega_1,\omega_2,\ldots\} and pi=P({ωi})p_i = P(\{\omega_i\}), then pi0p_i \ge 0, ipi=1\sum_i p_i = 1, and

P(A)=ωiApi.P(A) = \sum_{\omega_i \in A} p_i.

When all finite outcomes are equally likely, the classical rule is

P(A)=#A#Ω.P(A)=\frac{\# A}{\# \Omega}.

This rule is useful but not a definition of probability in general. It applies only when the outcomes are equally likely and the sample space is finite.

Key results

The Kolmogorov axioms are the foundation:

AxiomStatementMeaning
NonnegativityP(A)0P(A) \ge 0An event cannot have negative probability.
NormalizationP(Ω)=1P(\Omega)=1Something in the sample space occurs.
Countable additivityIf A1,A2,A_1,A_2,\ldots are pairwise disjoint, then P(iAi)=iP(Ai)P(\cup_i A_i)=\sum_i P(A_i)Probabilities of non-overlapping alternatives add.

Several rules follow immediately.

Complement rule. Since AA and AcA^c are disjoint and AAc=ΩA \cup A^c = \Omega,

P(Ac)=1P(A).P(A^c)=1-P(A).

Empty event. Since Ω\Omega and \varnothing are disjoint and Ω=Ω\Omega \cup \varnothing=\Omega,

P()=0.P(\varnothing)=0.

Monotonicity. If ABA \subseteq B, then

B=A(BA)B = A \cup (B \setminus A)

with disjoint pieces, so

P(B)=P(A)+P(BA)P(A).P(B)=P(A)+P(B\setminus A) \ge P(A).

Addition rule. For any two events,

P(AB)=P(A)+P(B)P(AB).P(A \cup B)=P(A)+P(B)-P(A \cap B).

The subtraction is necessary because outcomes in ABA \cap B were counted once in P(A)P(A) and once in P(B)P(B).

Finite inclusion-exclusion. For three events,

P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC).\begin{aligned} P(A \cup B \cup C) &=P(A)+P(B)+P(C)\\ &\quad -P(A\cap B)-P(A\cap C)-P(B\cap C)\\ &\quad +P(A\cap B\cap C). \end{aligned}

The pattern alternates between adding single events, subtracting pairwise intersections, and adding the triple intersection.

The axioms also separate probability from the physical story that motivated it. A coin, a weather forecast, and a randomized algorithm can all be modeled by the same probability rules once the sample space and event class are chosen. The hard modeling work is deciding which outcomes belong in Ω\Omega and which probability assignment is appropriate. In finite examples, the phrase "equally likely" must be justified by symmetry or design. In continuous examples, the probability of a single point is usually zero, so events must be intervals, regions, or other measurable sets. This is why the sigma-algebra F\mathcal{F} appears in the formal definition: probability is assigned to events that the model is prepared to measure, not to arbitrary verbal descriptions.

Visual

Event expressionPlain-language readingProbability rule
AcA^cAA does not occur1P(A)1-P(A)
ABA \cap Bboth AA and BB occurdepends on dependence
ABA \cup Bat least one of A,BA,B occursP(A)+P(B)P(AB)P(A)+P(B)-P(A\cap B)
ABA \setminus BAA occurs but BB does notP(A)P(AB)P(A)-P(A\cap B)
disjoint A,BA,Bno shared outcomesP(AB)=P(A)+P(B)P(A\cup B)=P(A)+P(B)

Worked example 1: two dice and event algebra

Problem. Roll two fair six-sided dice. Let AA be the event that the sum is 77, and let BB be the event that at least one die shows 66. Find P(A)P(A), P(B)P(B), P(AB)P(A\cap B), and P(AB)P(A\cup B).

Method.

  1. The sample space is ordered pairs:
Ω={(i,j):i,j{1,2,3,4,5,6}}.\Omega=\{(i,j): i,j\in\{1,2,3,4,5,6\}\}.

There are 66=366\cdot 6=36 equally likely outcomes.

  1. Event AA contains the pairs whose sum is 77:
A={(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}.A=\{(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)\}.

Therefore #A=6\#A=6 and

P(A)=636=16.P(A)=\frac{6}{36}=\frac{1}{6}.
  1. Event BB contains outcomes with a 66 in the first coordinate or second coordinate. There are 66 outcomes with first die 66 and 66 outcomes with second die 66, but (6,6)(6,6) is counted twice:
#B=6+61=11.\#B=6+6-1=11.

Hence

P(B)=1136.P(B)=\frac{11}{36}.
  1. For ABA\cap B, the sum must be 77 and at least one die must be 66. From the list for AA, only (1,6)(1,6) and (6,1)(6,1) qualify:
P(AB)=236=118.P(A\cap B)=\frac{2}{36}=\frac{1}{18}.
  1. Use the addition rule:
P(AB)=P(A)+P(B)P(AB)=636+1136236=1536=512.\begin{aligned} P(A\cup B) &=P(A)+P(B)-P(A\cap B)\\ &=\frac{6}{36}+\frac{11}{36}-\frac{2}{36}\\ &=\frac{15}{36}\\ &=\frac{5}{12}. \end{aligned}

Checked answer. Direct counting confirms this: ABA\cup B has the 1111 outcomes in BB plus the 44 sum-seven outcomes without a 66, for 1515 outcomes out of 3636.

Worked example 2: a countably infinite sample space

Problem. A coin is tossed until the first head appears. Let TT be the toss number on which the first head appears. For a fair coin, find P(T3)P(T\le 3) and P(T>3)P(T\gt 3).

Method.

  1. The sample space can be written as
Ω={H,TH,TTH,TTTH,}.\Omega=\{H,TH,TTH,TTTH,\ldots\}.

Equivalently, the outcome is T{1,2,3,}T\in\{1,2,3,\ldots\}.

  1. The event T=kT=k means the first k1k-1 tosses are tails and the kk-th toss is heads:
P(T=k)=(12)k1(12)=(12)k.P(T=k)=\left(\frac{1}{2}\right)^{k-1}\left(\frac{1}{2}\right)=\left(\frac{1}{2}\right)^k.
  1. Add the disjoint cases T=1,2,3T=1,2,3:
P(T3)=P(T=1)+P(T=2)+P(T=3)=12+14+18=78.\begin{aligned} P(T\le 3) &=P(T=1)+P(T=2)+P(T=3)\\ &=\frac{1}{2}+\frac{1}{4}+\frac{1}{8}\\ &=\frac{7}{8}. \end{aligned}
  1. Use the complement rule:
P(T>3)=1P(T3)=178=18.P(T>3)=1-P(T\le 3)=1-\frac{7}{8}=\frac{1}{8}.
  1. Check directly. The event T>3T\gt 3 means the first three tosses are all tails:
P(T>3)=P(TTT)=(12)3=18.P(T>3)=P(TTT)=\left(\frac{1}{2}\right)^3=\frac{1}{8}.

Checked answer. P(T3)=7/8P(T\le 3)=7/8 and P(T>3)=1/8P(T\gt 3)=1/8. The probabilities over the infinite sample space still sum to one because

k=1(12)k=1.\sum_{k=1}^{\infty}\left(\frac{1}{2}\right)^k=1.

Code

from fractions import Fraction

omega = [(i, j) for i in range(1, 7) for j in range(1, 7)]
A = {(i, j) for (i, j) in omega if i + j == 7}
B = {(i, j) for (i, j) in omega if i == 6 or j == 6}

def prob(event):
return Fraction(len(event), len(omega))

print("P(A) =", prob(A))
print("P(B) =", prob(B))
print("P(A and B) =", prob(A & B))
print("P(A or B) =", prob(A | B))

# Verify the addition rule.
lhs = prob(A | B)
rhs = prob(A) + prob(B) - prob(A & B)
print("addition rule holds:", lhs == rhs)

Common pitfalls

  • Treating a sample space as equally likely without checking the modeling assumption. The unordered sums of two dice, for example, are not equally likely.
  • Forgetting that ABA\cup B is inclusive: it includes outcomes where both AA and BB occur.
  • Adding P(A)+P(B)P(A)+P(B) for overlapping events without subtracting P(AB)P(A\cap B).
  • Confusing the impossible event \varnothing with events that have probability zero in continuous models. A point such as {0.5}\{0.5\} can have probability zero under a continuous distribution without being logically impossible.
  • Defining events vaguely. "A high value" is not an event until the cutoff is specified.
  • Assuming probability one means guaranteed in every philosophical sense. In continuous probability, probability-one events may still exclude exceptional outcomes.

Connections