Probability and Random Variables

This section is a rigorous probability course map following MIT 18.440 Probability and Random Variables, Scott Sheffield, Spring 2014. It begins with counting and axioms, then builds random variables, expectation, variance, standard distributions, joint laws, transforms, limit theorems, Markov chains, entropy, martingales, and the risk-neutral probability viewpoint used in Black-Scholes.

Pierre-Simon de Laplace is shown in a historical engraved portrait.

Figure: Pierre-Simon de Laplace is a key figure in probability, transforms, and potential theory. Image: Wikimedia Commons, Louis Delaistre after Armand-Charles Guilleminot, public domain.

A Galton box diagram shows balls falling through pegs into bins.

Figure: A Galton box turns repeated random left-right choices into an approximate bell-shaped distribution. Image: Wikimedia Commons, Marcin Floryan, CC BY-SA 3.0.

Tree diagrams organize conditional probabilities for Bayes' theorem.

Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.

The notes are meant to sit between a short applied probability introduction and a measure-theoretic graduate course. They use finite and countable models when possible, continuous densities when needed, and proof sketches for the structural results that students repeatedly use. The section also links outward to the shorter /math/probability/ pages, discrete mathematics probability, and statistics when the same ideas appear in a different style.

Definitions

The section treats probability as a mathematical measure $P$ on events in a sample space, random variables as real-valued functions on that space, and distributions as the induced laws of those variables. The early pages emphasize exact finite models; the middle pages emphasize densities and joint laws; the later pages emphasize asymptotic behavior and stochastic processes.

The generated pages, in lecture order, are:

Key results

The first organizing principle is that finite probability is counting plus normalization. If $S$ is a finite sample space with equally likely outcomes, then

P(A)=\frac{|A|}{|S|}.

The hard part is usually choosing $S$ so that this ratio is legitimate. Counting tools such as permutations, binomial coefficients, multinomial coefficients, complements, and inclusion-exclusion provide the numerator and denominator.

The second organizing principle is conditioning. For $P(B)\gt 0$ ,

P(A\mid B)=\frac{P(A\cap B)}{P(B)}.

Bayes' formula, independence, conditional distributions, conditional expectation, and martingales are all extensions of this idea. Conditioning is the formal way probability updates when information arrives.

The third organizing principle is that random variables allow algebra. Means, variances, covariance, sums, transformations, moment generating functions, and characteristic functions all work because a random variable turns an outcome into a number. The two most used identities are

E[X+Y]=E[X]+E[Y]

and, when covariance is controlled,

\operatorname{Var}(X+Y) = \operatorname{Var}(X)+\operatorname{Var}(Y)+2\operatorname{Cov}(X,Y).

The fourth organizing principle is asymptotic regularity. Under suitable hypotheses, averages stabilize and normalized errors become normal:

\frac{X_1+\cdots+X_n}{n}\to \mu, \qquad \frac{X_1+\cdots+X_n-n\mu}{\sigma\sqrt n}\Rightarrow N(0,1).

The law of large numbers and the central limit theorem explain why large random systems can be predictable even when individual outcomes remain random.

The pages are deliberately cumulative. A later page often reuses an earlier idea rather than re-proving it from scratch. Poisson processes rely on binomial rare-event limits and exponential waiting times. Conditional expectation relies on conditional probability and joint distributions. Martingales rely on conditional expectation. Risk-neutral pricing relies on martingales, expectation, and the normal distribution. When a computation feels mysterious, the right repair is usually to walk backward through this dependency chain until the sample space, conditioning event, or distributional mechanism is explicit.

The section also separates exact answers from approximations. Inclusion-exclusion, conditioning, convolution, and transform identities are exact when their assumptions hold. Poisson approximation, normal approximation, and large-number reasoning become accurate in limiting regimes. A rigorous solution should say which mode it is using. For example, a binomial formula may give an exact probability, a Poisson law may give a rare-event approximation, and a normal law may give a large-sample approximation to the same family of problems.

Visual

Course block	Main question	Representative page
Counting and axioms	How are probabilities assigned consistently?	Probability axioms and inclusion-exclusion
Conditioning	How does information update probabilities?	Conditional probability, Bayes, and independence
Random variables	How do numerical outcomes behave?	Discrete random variables, expectation, and variance
Distributions	Which laws model common mechanisms?	Normal, exponential, gamma, beta, and Cauchy laws
Limit theory	What happens after many trials?	Weak law, concentration, and the central limit theorem
Processes and information	How does randomness evolve or encode uncertainty?	Markov chains

Worked example 1: choosing the right early-course tool

Problem: A probability problem says $n$ hats are shuffled randomly among $n$ people. It asks for the probability that nobody gets their own hat. Which pages should be used, and what is the solution path?

Method:

The phrase "shuffled randomly" suggests a finite equally likely model: all $n!$ permutations are equally likely. Start with counting and combinatorics.
The event "nobody gets their own hat" is a complement of a union. Let $E_i$ be the event that person $i$ gets their own hat.
The desired event is

E_1^c\cap\cdots\cap E_n^c = (E_1\cup\cdots\cup E_n)^c.

This points to probability axioms and inclusion-exclusion.
For any fixed set of $r$ people, the probability all $r$ get their own hats is

\frac{(n-r)!}{n!}.

Inclusion-exclusion gives

P(\text{nobody gets own hat}) = \sum_{r=0}^{n}(-1)^r\frac1{r!}.

Checked answer: for large $n$ , this is close to $e^{-1}$ . The navigation is counting first, axioms second, inclusion-exclusion third.

Worked example 2: choosing the right late-course tool

Problem: A fair coin is tossed $400$ times. We want to know why the fraction of heads should be close to $1/2$ and how to approximate the chance of seeing between $190$ and $210$ heads.

Method:

Let $X$ be the number of heads. The count is binomial $(400,1/2)$ , so begin with Bernoulli, binomial, geometric, and negative binomial laws.
The expected count and variance are

E[X]=400\cdot\frac12=200, \qquad \operatorname{Var}(X)=400\cdot\frac12\cdot\frac12=100.

The fraction $X/400$ is close to $1/2$ by the law of large numbers, so use weak law, concentration, and the central limit theorem.
For an approximation of the interval probability, use the central limit theorem. Standard deviation is $10$ .
With continuity correction,

P(190\le X\le 210) \approx P(189.5\le N\le 210.5),

where $N$ is normal with mean $200$ and standard deviation $10$ . 6. Standardizing gives

P(-1.05\le Z\le 1.05) =\Phi(1.05)-\Phi(-1.05) \approx 0.706.

Checked answer: the section path is binomial model, then expectation and variance, then CLT approximation.

Code

pages = [
    ("Counting", "/math/probability-and-random-variables/counting-and-combinatorics"),
    ("Axioms", "/math/probability-and-random-variables/probability-axioms-and-inclusion-exclusion"),
    ("Conditioning", "/math/probability-and-random-variables/conditional-probability-bayes-independence"),
    ("Random variables", "/math/probability-and-random-variables/discrete-random-variables-expectation-variance"),
    ("Limit theorems", "/math/probability-and-random-variables/weak-law-concentration-central-limit-theorem"),
]

def suggest_pages(problem_text):
    text = problem_text.lower()
    suggestions = []
    if "shuffle" in text or "choose" in text or "count" in text:
        suggestions.append(pages[0])
    if "at least" in text or "none" in text or "union" in text:
        suggestions.append(pages[1])
    if "given" in text or "test" in text or "bayes" in text:
        suggestions.append(pages[2])
    if "average" in text or "many" in text or "normal approximation" in text:
        suggestions.append(pages[4])
    return suggestions

for title, link in suggest_pages("many fair coin tosses need a normal approximation"):
    print(title, "->", link)

Common pitfalls

Skipping the sample-space step and applying formulas to outcomes that are not equally likely.
Treating conditional probabilities as reversible; $P(A\mid B)$ and $P(B\mid A)$ answer different questions.
Memorizing distribution formulas without identifying the random mechanism: fixed number of trials, waiting time, rare-event count, memoryless wait, or accumulated small effects.
Using independence when the story describes sampling without replacement or a shared constraint.
Applying limit theorems without checking finite mean, finite variance, independence, and scaling.
Treating martingale and risk-neutral probability statements as ordinary real-world frequency claims rather than conditional-expectation and pricing statements.

Definitions​

Key results​

Visual​

Worked example 1: choosing the right early-course tool​

Worked example 2: choosing the right late-course tool​

Code​

Common pitfalls​

Connections​