Limit Theorems
Limit theorems explain why averages stabilize and why normal distributions appear so often. The law of large numbers says that sample averages converge toward the expected value. The central limit theorem says that, after centering and scaling, sums of many weakly behaved independent variables become approximately normal. These results are the mathematical reason repeated measurement can produce reliable estimates.
Lane et al.'s sampling-distributions chapter uses the central limit theorem to explain why sample means are often nearly normal even when the parent population is not. This page focuses on the probability-theory statements and calculations, while the statistics section handles inference applications.

Figure: Simulation illustrating convergence in the central limit theorem. Image: Wikimedia Commons, Daniel Resende, CC BY-SA 4.0.
Definitions
Let be random variables with common mean . The sample mean is
A sequence converges in probability to if, for every ,
as .
A sequence converges almost surely to if
Almost sure convergence is stronger than convergence in probability.
A sequence converges in distribution to if
at every continuity point of .
The standard normal random variable is denoted by , with CDF .
Key results
Weak law of large numbers. If are independent and identically distributed with and finite variance , then
A proof sketch uses Chebyshev's inequality. Since
and, by independence,
Chebyshev gives
Strong law of large numbers. Under standard conditions such as independent identically distributed variables with ,
almost surely.
Central limit theorem. If are independent identically distributed with mean and variance , then
Equivalently,
Normal approximation to binomial. If , then for large ,
A continuity correction often improves the approximation:
where .
The LLN and CLT answer different questions. The LLN says the sample mean becomes close to ; it does not describe the detailed shape of the error. The CLT describes that error after multiplying by . In practical terms, the typical size of is on the order of , not . Quadrupling the sample size roughly halves the standard error.
The assumptions matter. Independence can be weakened in some advanced versions, and identical distribution can also be relaxed, but some control over dependence and tail size is still needed. Heavy-tailed variables with infinite variance may converge to non-normal stable laws rather than to a normal distribution. Strong dependence can prevent averaging from reducing uncertainty at the usual rate.
The CLT is about distributions, not about individual observations becoming normal. If the original data are skewed, individual future observations remain skewed. What becomes approximately normal is the standardized sum or average. This distinction is central in statistics: a sample mean can be approximately normal even when the raw data are not.
Continuity corrections illustrate another practical issue: a theorem may give the limiting shape, but finite-sample accuracy depends on details. A binomial count is discrete, while the normal approximation is continuous. Replacing with an area from to aligns integer bars with continuous intervals. For highly skewed distributions or tail probabilities, simulation or exact computation may be better than a rough CLT approximation.
Limit theorems justify many methods, but they do not remove the need for diagnostics. If data are dependent over time, clustered by group, or generated by a changing process, the effective sample size may be much smaller than the raw count.
The standard error is the practical bridge from the CLT to statistics. It quantifies the spread of sample means across repeated samples. If is unknown, statistics replaces it with an estimate, which leads to procedures under normal-sample assumptions. That inferential step is covered in the statistics section, but the probability source is the variance of the sample mean.
For proportions, the same logic gives standard error
This comes from treating each trial as Bernoulli.
As sample size grows, the standard error shrinks, but model bias does not automatically disappear. Averaging many biased measurements converges to the wrong center. Limit theorems control random error under assumptions; they do not fix bad measurement, sampling bias, or a misspecified target.
Visual
| Theorem | What converges? | Type of convergence | Main message |
|---|---|---|---|
| Weak LLN | probability | averages get close to | |
| Strong LLN | almost surely | long-run average settles at | |
| CLT | standardized sum or mean | distribution | shape becomes normal |
| Normal approximation | binomial counts | approximation | large count probabilities use |
Worked example 1: LLN bound for coin flips
Problem. A fair coin is flipped times. Let be the sample proportion of heads, where for heads and for tails. Use Chebyshev's inequality to bound
Method.
- For one flip,
- Mean and variance:
- The sample mean has variance
- Chebyshev's inequality says
- Substitute :
- Interpretation. Chebyshev guarantees the probability is at most . The exact probability is smaller, but Chebyshev works broadly without requiring a binomial table.
Checked answer. The Chebyshev upper bound is .
Worked example 2: CLT approximation for a binomial count
Problem. Suppose . Approximate using the normal approximation with continuity correction.
Method.
- Compute mean:
- Compute variance and standard deviation:
- Apply continuity correction:
where .
- Standardize lower endpoint:
- Standardize upper endpoint:
- Use the standard normal CDF:
- With and ,
Checked answer. The CLT approximation is about .
Code
import numpy as np
from scipy.stats import binom, norm
# LLN simulation.
rng = np.random.default_rng(3)
n = 1000
reps = 20_000
samples = rng.binomial(1, 0.5, size=(reps, n))
means = samples.mean(axis=1)
sim_prob = np.mean(np.abs(means - 0.5) >= 0.05)
chebyshev_bound = 0.25 / (n * 0.05**2)
print("simulation:", sim_prob)
print("Chebyshev bound:", chebyshev_bound)
# CLT approximation for Binomial(200, 0.40).
n, p = 200, 0.40
mu = n * p
sigma = np.sqrt(n * p * (1 - p))
approx = norm.cdf((90.5 - mu) / sigma) - norm.cdf((69.5 - mu) / sigma)
exact = binom.cdf(90, n, p) - binom.cdf(69, n, p)
print("normal approximation:", approx)
print("exact binomial:", exact)
Common pitfalls
- Thinking the LLN says short-run outcomes must balance out. It says averages stabilize as grows, not that tails "owe" heads.
- Using the CLT for very small samples without checking the parent distribution or skew.
- Forgetting to scale by in the CLT.
- Confusing convergence in probability with convergence in distribution.
- Applying the normal approximation to a binomial when or is too small.
- Forgetting the continuity correction for discrete-to-continuous approximations when accuracy matters.