Probability and Random Variables
This section is a rigorous probability course map following MIT 18.440 Probability and Random Variables, Scott Sheffield, Spring 2014. It begins with counting and axioms, then builds random variables, expectation, variance, standard distributions, joint laws, transforms, limit theorems, Markov chains, entropy, martingales, and the risk-neutral probability viewpoint used in Black-Scholes.

Figure: Pierre-Simon de Laplace is a key figure in probability, transforms, and potential theory. Image: Wikimedia Commons, Louis Delaistre after Armand-Charles Guilleminot, public domain.
Figure: A Galton box turns repeated random left-right choices into an approximate bell-shaped distribution. Image: Wikimedia Commons, Marcin Floryan, CC BY-SA 3.0.
Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.
The notes are meant to sit between a short applied probability introduction and a measure-theoretic graduate course. They use finite and countable models when possible, continuous densities when needed, and proof sketches for the structural results that students repeatedly use. The section also links outward to the shorter /math/probability/ pages, discrete mathematics probability, and statistics when the same ideas appear in a different style.
Definitions
The section treats probability as a mathematical measure on events in a sample space, random variables as real-valued functions on that space, and distributions as the induced laws of those variables. The early pages emphasize exact finite models; the middle pages emphasize densities and joint laws; the later pages emphasize asymptotic behavior and stochastic processes.
The generated pages, in lecture order, are:
- Counting and combinatorics
- Probability axioms and inclusion-exclusion
- Conditional probability, Bayes, and independence
- Discrete random variables, expectation, and variance
- Bernoulli, binomial, geometric, and negative binomial laws
- Poisson random variables and Poisson processes
- Continuous random variables and uniform laws
- Normal, exponential, gamma, beta, and Cauchy laws
- Joint distributions, transformations, and independence
- Sums, convolutions, and order statistics
- Covariance, correlation, and conditional expectation
- Moment and characteristic functions
- Weak law, concentration, and the central limit theorem
- Strong law and Jensen's inequality
- Markov chains
- Entropy and coding
- Martingales, risk-neutral probability, and Black-Scholes
Key results
The first organizing principle is that finite probability is counting plus normalization. If is a finite sample space with equally likely outcomes, then
The hard part is usually choosing so that this ratio is legitimate. Counting tools such as permutations, binomial coefficients, multinomial coefficients, complements, and inclusion-exclusion provide the numerator and denominator.
The second organizing principle is conditioning. For ,
Bayes' formula, independence, conditional distributions, conditional expectation, and martingales are all extensions of this idea. Conditioning is the formal way probability updates when information arrives.
The third organizing principle is that random variables allow algebra. Means, variances, covariance, sums, transformations, moment generating functions, and characteristic functions all work because a random variable turns an outcome into a number. The two most used identities are
and, when covariance is controlled,
The fourth organizing principle is asymptotic regularity. Under suitable hypotheses, averages stabilize and normalized errors become normal:
The law of large numbers and the central limit theorem explain why large random systems can be predictable even when individual outcomes remain random.
The pages are deliberately cumulative. A later page often reuses an earlier idea rather than re-proving it from scratch. Poisson processes rely on binomial rare-event limits and exponential waiting times. Conditional expectation relies on conditional probability and joint distributions. Martingales rely on conditional expectation. Risk-neutral pricing relies on martingales, expectation, and the normal distribution. When a computation feels mysterious, the right repair is usually to walk backward through this dependency chain until the sample space, conditioning event, or distributional mechanism is explicit.
The section also separates exact answers from approximations. Inclusion-exclusion, conditioning, convolution, and transform identities are exact when their assumptions hold. Poisson approximation, normal approximation, and large-number reasoning become accurate in limiting regimes. A rigorous solution should say which mode it is using. For example, a binomial formula may give an exact probability, a Poisson law may give a rare-event approximation, and a normal law may give a large-sample approximation to the same family of problems.
Visual
| Course block | Main question | Representative page |
|---|---|---|
| Counting and axioms | How are probabilities assigned consistently? | Probability axioms and inclusion-exclusion |
| Conditioning | How does information update probabilities? | Conditional probability, Bayes, and independence |
| Random variables | How do numerical outcomes behave? | Discrete random variables, expectation, and variance |
| Distributions | Which laws model common mechanisms? | Normal, exponential, gamma, beta, and Cauchy laws |
| Limit theory | What happens after many trials? | Weak law, concentration, and the central limit theorem |
| Processes and information | How does randomness evolve or encode uncertainty? | Markov chains |
Worked example 1: choosing the right early-course tool
Problem: A probability problem says hats are shuffled randomly among people. It asks for the probability that nobody gets their own hat. Which pages should be used, and what is the solution path?
Method:
- The phrase "shuffled randomly" suggests a finite equally likely model: all permutations are equally likely. Start with counting and combinatorics.
- The event "nobody gets their own hat" is a complement of a union. Let be the event that person gets their own hat.
- The desired event is
- This points to probability axioms and inclusion-exclusion.
- For any fixed set of people, the probability all get their own hats is
- Inclusion-exclusion gives
Checked answer: for large , this is close to . The navigation is counting first, axioms second, inclusion-exclusion third.
Worked example 2: choosing the right late-course tool
Problem: A fair coin is tossed times. We want to know why the fraction of heads should be close to and how to approximate the chance of seeing between and heads.
Method:
- Let be the number of heads. The count is binomial, so begin with Bernoulli, binomial, geometric, and negative binomial laws.
- The expected count and variance are
- The fraction is close to by the law of large numbers, so use weak law, concentration, and the central limit theorem.
- For an approximation of the interval probability, use the central limit theorem. Standard deviation is .
- With continuity correction,
where is normal with mean and standard deviation . 6. Standardizing gives
Checked answer: the section path is binomial model, then expectation and variance, then CLT approximation.
Code
pages = [
("Counting", "/math/probability-and-random-variables/counting-and-combinatorics"),
("Axioms", "/math/probability-and-random-variables/probability-axioms-and-inclusion-exclusion"),
("Conditioning", "/math/probability-and-random-variables/conditional-probability-bayes-independence"),
("Random variables", "/math/probability-and-random-variables/discrete-random-variables-expectation-variance"),
("Limit theorems", "/math/probability-and-random-variables/weak-law-concentration-central-limit-theorem"),
]
def suggest_pages(problem_text):
text = problem_text.lower()
suggestions = []
if "shuffle" in text or "choose" in text or "count" in text:
suggestions.append(pages[0])
if "at least" in text or "none" in text or "union" in text:
suggestions.append(pages[1])
if "given" in text or "test" in text or "bayes" in text:
suggestions.append(pages[2])
if "average" in text or "many" in text or "normal approximation" in text:
suggestions.append(pages[4])
return suggestions
for title, link in suggest_pages("many fair coin tosses need a normal approximation"):
print(title, "->", link)
Common pitfalls
- Skipping the sample-space step and applying formulas to outcomes that are not equally likely.
- Treating conditional probabilities as reversible; and answer different questions.
- Memorizing distribution formulas without identifying the random mechanism: fixed number of trials, waiting time, rare-event count, memoryless wait, or accumulated small effects.
- Using independence when the story describes sampling without replacement or a shared constraint.
- Applying limit theorems without checking finite mean, finite variance, independence, and scaling.
- Treating martingale and risk-neutral probability statements as ordinary real-world frequency claims rather than conditional-expectation and pricing statements.