Expectation, Variance, and Moments
Expectation summarizes the long-run center of a random variable; variance summarizes its spread; moments summarize increasingly detailed features of its distribution. These quantities are not substitutes for the full distribution, but they are often the most useful first summaries. They also power limit theorems, regression, risk calculations, and error estimates.
Figure: A Galton box turns repeated random left-right choices into an approximate bell-shaped distribution. Image: Wikimedia Commons, Marcin Floryan, CC BY-SA 3.0.
The key idea is weighted averaging. In a discrete distribution, values are weighted by probability masses. In a continuous distribution, values are weighted by density and integrated. The same logic also applies to functions of random variables through the law of the unconscious statistician.
Definitions
For a discrete random variable with PMF , the expected value is
provided the sum converges absolutely.
For a continuous random variable with PDF ,
provided the integral converges absolutely.
For a function , the law of the unconscious statistician says
in the discrete case, and
in the continuous case.
The variance is
The standard deviation is
The -th raw moment is . The -th central moment is , where . The third central moment is related to skewness, and the fourth central moment is related to kurtosis.
Key results
Linearity of expectation. For constants ,
Linearity does not require independence.
Computational variance formula.
Proof:
Scaling and shifting.
Adding a constant shifts the center but does not change spread. Multiplying by scales standard deviation by and variance by .
Variance of sums.
If and are independent, then the covariance term is zero, so variances add.
Indicator trick. If is when event occurs and otherwise, then
This converts counting problems into expectation problems.
Expectation is linear even when variables are dependent, which makes it unusually powerful. If counts the number of students in a room who share a birthday with someone else, it may be hard to write down the full distribution of . But if is written as a sum of indicator variables, then can often be found by adding the probabilities that each indicator equals . This technique appears in occupancy problems, randomized algorithms, and combinatorics.
Variance is less forgiving. To find the variance of a sum, dependence must be handled through covariance terms. If are independent, then
Without independence, this formula can be badly wrong. Positive covariance increases the variance of a sum; negative covariance decreases it.
Moments may fail to exist. A distribution can have a median and many quantiles but no finite mean. It can have a finite mean but infinite variance. When a theorem assumes finite variance, that condition rules out some heavy-tailed models. Before manipulating or , check that the relevant expectation is finite, especially for distributions with polynomial tails.
Expectation is not automatically a typical value. In skewed distributions, the mean can be pulled far into the tail. For waiting times, income, file sizes, and insurance losses, the median, quantiles, or tail probabilities may be more representative for user-facing summaries. Variance has a similar limitation: it summarizes spread around the mean, so it can be dominated by rare extreme values. Moments are powerful algebraic summaries, but they should be paired with distribution shape when interpretation matters.
Conditional expectation is another central extension. The quantity is itself a random variable: after is observed, it gives the average value of under that condition. The identity
means that averaging conditional averages recovers the overall average. This is often the cleanest way to compute expectations in multi-stage experiments.
For example, if a random number of customers arrive and each customer independently spends a random amount, conditioning on the number of customers separates the count randomness from the spending randomness. The outer expectation then averages over possible customer counts. This pattern appears in compound distributions, insurance risk, and queueing models.
Visual
| Quantity | Formula | Interpretation |
|---|---|---|
| Mean | balance point or long-run average | |
| Raw second moment | average squared size | |
| Variance | average squared deviation from mean | |
| Standard deviation | spread in original units | |
| Skewness numerator | direction of asymmetry | |
| Fourth central moment | tail weight and peakedness |
Worked example 1: expectation and variance of one die
Problem. Let be the result of one fair six-sided die. Compute , , and .
Method.
-
The possible values are , each with probability .
-
Compute the expectation:
- Compute the second raw moment:
- Use the computational formula:
- Standard deviation is
Checked answer. , , and .
Worked example 2: expectation of an exponential random variable
Problem. Let with density for . Show that and .
Method.
- Compute the mean:
- Use integration by parts with and . Then and .
- Compute the second moment:
- Integration by parts gives
- Now compute variance:
Checked answer. and .
Code
import numpy as np
from scipy.integrate import quad
# Die moments.
values = np.arange(1, 7)
probs = np.ones(6) / 6
mean = np.sum(values * probs)
second = np.sum(values**2 * probs)
variance = second - mean**2
print(mean, second, variance)
# Exponential moments by numerical integration.
lam = 2.5
density = lambda x: lam * np.exp(-lam * x)
mean_integral, _ = quad(lambda x: x * density(x), 0, np.inf)
second_integral, _ = quad(lambda x: x**2 * density(x), 0, np.inf)
var_integral = second_integral - mean_integral**2
print(mean_integral, var_integral)
print("theory:", 1 / lam, 1 / lam**2)
Common pitfalls
- Treating expectation as the most likely value. A fair die has expected value , which is not a possible roll.
- Assuming expectation always exists. Heavy-tailed distributions can have undefined means or variances.
- Forgetting to square the scaling constant in variance: .
- Assuming . This is generally false except for linear .
- Adding standard deviations instead of variances for independent sums.
- Using variance formulas that require independence without checking independence.