Discrete Random Variables, Expectation, and Variance
A random variable turns outcomes into numbers. This change of viewpoint is one of the main transitions in probability: instead of asking only which event occurred, we ask for numerical summaries such as the number of heads, the number of fixed points in a shuffled hat problem, or the total payoff from a gamble. Once outcomes have numerical values, expectation and variance become the central quantities.
Figure: A Galton box turns repeated random left-right choices into an approximate bell-shaped distribution. Image: Wikimedia Commons, Marcin Floryan, CC BY-SA 3.0.
The MIT lectures introduce random variables as functions on the sample space, then define probability mass functions, cumulative distribution functions, expectation, variance, and the decomposition trick of writing complicated variables as sums of simple indicator variables. Linearity of expectation is the most important early result: it does not require independence and is often the simplest way to compute an expected count.
Definitions
A random variable is a function from the sample space to the real numbers. It assigns a numerical value to each outcome .
A random variable is discrete if it takes values in a finite or countable set with probability one. Its probability mass function is
The cumulative distribution function is
If the relevant sums converge absolutely, the expectation is
If is a function, then
which is often called the law of the unconscious statistician.
The variance of is
The standard deviation is .
An indicator random variable for event is
Its expectation is .
Key results
Expectation is linear:
whenever the expectations exist. No independence assumption is needed. For a countable sample space this follows by summing over outcomes:
The computational variance formula is
Proof:
Scaling and shifting behave as follows:
The indicator decomposition trick writes a count as
Then
even when the events are dependent. This explains why expected counts can be much easier than exact distributions.
There are two common ways to compute an expectation. One sums over values of the random variable:
The other sums over the original sample space:
These are the same calculation grouped differently. The value-based formula groups together all outcomes with the same value. The state-space formula is sometimes easier when the sample space is small; the value-based formula is usually easier when the distribution of is already known.
Expectation can exist even when a most likely value is absent or misleading. For a fair die, the expectation is , which is not an outcome. In a lottery, a very large rare payoff can dominate the expectation even though the typical outcome is zero. This is why the lectures introduce variance immediately after expectation: the mean alone does not describe risk, spread, or typical behavior.
Variance depends on squared deviations, so it is sensitive to rare large values. If is measured in dollars, then is measured in squared dollars, which is one reason the standard deviation is often easier to interpret. Still, variance is algebraically convenient because squares expand cleanly and because independent variances add.
Indicator variables turn probability questions into expectation questions. If counts the number of successes among many possibly dependent events, then only needs the individual success probabilities. This is why the expected number of fixed points in a random permutation is easy even though the exact distribution of fixed points requires inclusion-exclusion. The method also prepares for binomial variables, where a sum of independent indicators gives both the expectation and variance.
One must be more careful with infinite sums. A discrete random variable with values may have probabilities summing to but still fail to have a finite expectation. The expression must converge in the usual absolute sense for expectation to be safely manipulated by linearity and variance formulas.
Visual
| Quantity | Formula | What it measures |
|---|---|---|
| PMF | probability at each value | |
| CDF | accumulated probability | |
| Mean | center or long-run average | |
| Second moment | raw squared size | |
| Variance | spread around the mean | |
| Indicator mean | probability as expectation |
The table separates distributional information from summary information. A PMF or CDF can determine all probabilities involving . The mean and variance compress that information into two numbers. Compression is useful, but it loses detail. Two random variables can have the same mean and variance while having very different shapes, tail behavior, or most likely values. Later limit theorems explain why mean and variance are often enough for averages, but individual distributions still require more information.
When a problem asks for an expected count, try indicators before trying to find the whole PMF. When a problem asks for the probability that the count equals a specific value, the PMF is unavoidable. This distinction explains why the expected number of fixed hats is much easier than the probability that exactly three people get their own hats.
Worked example 1: expectation and variance of a die roll
Problem: Let be the result of a fair six-sided die. Compute and .
Method:
- The PMF is for .
- The expectation is
- Compute the second moment:
- Use the variance formula:
Checked answer: the standard deviation is , which is plausible because die values range only from to .
Worked example 2: expected fixed points in the hat shuffle
Problem: In a random shuffle of hats among people, let be the number of people who get their own hat. Compute .
Method:
- Let be the event that person receives their own hat.
- Define indicators
- Then the total number of fixed points is
- For each person, symmetry gives
- Therefore
- By linearity,
Checked answer: the expected number of people receiving their own hat is always , no matter how large is. This does not mean exactly one person usually gets their hat; it means the average count over many random shuffles is .
Code
from fractions import Fraction
values = range(1, 7)
mean = sum(Fraction(k, 6) for k in values)
second_moment = sum(Fraction(k * k, 6) for k in values)
variance = second_moment - mean * mean
print("die mean:", mean)
print("die variance:", variance)
def expected_fixed_points(n):
return sum(1 / n for _ in range(n))
for n in [3, 10, 100]:
print(n, expected_fixed_points(n))
Common pitfalls
- Thinking a random variable is an event. An event is a subset of outcomes; a random variable is a numerical function on outcomes.
- Forgetting that a PMF must sum to .
- Assuming linearity of expectation requires independence. It does not.
- Assuming variance is linear. In general is not unless covariance is zero.
- Interpreting expectation as the most likely value. A random variable may have expectation without ever equaling .