Skip to main content

Limit Theorems

Limit theorems explain why averages stabilize and why normal distributions appear so often. The law of large numbers says that sample averages converge toward the expected value. The central limit theorem says that, after centering and scaling, sums of many weakly behaved independent variables become approximately normal. These results are the mathematical reason repeated measurement can produce reliable estimates.

Lane et al.'s sampling-distributions chapter uses the central limit theorem to explain why sample means are often nearly normal even when the parent population is not. This page focuses on the probability-theory statements and calculations, while the statistics section handles inference applications.

Simulated sample-mean distributions become more concentrated and bell-shaped as sample size increases.

Figure: Simulation illustrating convergence in the central limit theorem. Image: Wikimedia Commons, Daniel Resende, CC BY-SA 4.0.

Definitions

Let X1,X2,X_1,X_2,\ldots be random variables with common mean μ\mu. The sample mean is

Xn=1ni=1nXi.\overline{X}_n=\frac{1}{n}\sum_{i=1}^n X_i.

A sequence YnY_n converges in probability to YY if, for every ϵ>0\epsilon\gt 0,

P(YnY>ϵ)0P(|Y_n-Y|>\epsilon)\to 0

as nn\to\infty.

A sequence YnY_n converges almost surely to YY if

P(limnYn=Y)=1.P\left(\lim_{n\to\infty}Y_n=Y\right)=1.

Almost sure convergence is stronger than convergence in probability.

A sequence YnY_n converges in distribution to YY if

FYn(y)FY(y)F_{Y_n}(y)\to F_Y(y)

at every continuity point yy of FYF_Y.

The standard normal random variable is denoted by ZN(0,1)Z\sim N(0,1), with CDF Φ\Phi.

Key results

Weak law of large numbers. If X1,X2,X_1,X_2,\ldots are independent and identically distributed with E[Xi]=μE[X_i]=\mu and finite variance σ2\sigma^2, then

XnPμ.\overline{X}_n \xrightarrow{P} \mu.

A proof sketch uses Chebyshev's inequality. Since

E[Xn]=μE[\overline{X}_n]=\mu

and, by independence,

Var(Xn)=σ2n,\operatorname{Var}(\overline{X}_n)=\frac{\sigma^2}{n},

Chebyshev gives

P(Xnμϵ)σ2nϵ20.P(|\overline{X}_n-\mu|\ge \epsilon) \le \frac{\sigma^2}{n\epsilon^2}\to 0.

Strong law of large numbers. Under standard conditions such as independent identically distributed variables with E[X1]<E[\vert X_1\vert ]\lt \infty,

Xnμ\overline{X}_n \to \mu

almost surely.

Central limit theorem. If X1,,XnX_1,\ldots,X_n are independent identically distributed with mean μ\mu and variance σ2>0\sigma^2\gt 0, then

i=1nXinμσndN(0,1).\frac{\sum_{i=1}^n X_i-n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0,1).

Equivalently,

Xnμσ/ndN(0,1).\frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1).

Normal approximation to binomial. If XBinomial(n,p)X\sim\operatorname{Binomial}(n,p), then for large nn,

Xnpnp(1p)N(0,1).\frac{X-np}{\sqrt{np(1-p)}}\approx N(0,1).

A continuity correction often improves the approximation:

P(aXb)P(a0.5Yb+0.5)P(a\le X\le b)\approx P(a-0.5\le Y\le b+0.5)

where YN(np,np(1p))Y\sim N(np,np(1-p)).

The LLN and CLT answer different questions. The LLN says the sample mean becomes close to μ\mu; it does not describe the detailed shape of the error. The CLT describes that error after multiplying by n\sqrt{n}. In practical terms, the typical size of Xnμ\overline{X}_n-\mu is on the order of 1/n1/\sqrt{n}, not 1/n1/n. Quadrupling the sample size roughly halves the standard error.

The assumptions matter. Independence can be weakened in some advanced versions, and identical distribution can also be relaxed, but some control over dependence and tail size is still needed. Heavy-tailed variables with infinite variance may converge to non-normal stable laws rather than to a normal distribution. Strong dependence can prevent averaging from reducing uncertainty at the usual rate.

The CLT is about distributions, not about individual observations becoming normal. If the original data are skewed, individual future observations remain skewed. What becomes approximately normal is the standardized sum or average. This distinction is central in statistics: a sample mean can be approximately normal even when the raw data are not.

Continuity corrections illustrate another practical issue: a theorem may give the limiting shape, but finite-sample accuracy depends on details. A binomial count is discrete, while the normal approximation is continuous. Replacing P(70X90)P(70\le X\le 90) with an area from 69.569.5 to 90.590.5 aligns integer bars with continuous intervals. For highly skewed distributions or tail probabilities, simulation or exact computation may be better than a rough CLT approximation.

Limit theorems justify many methods, but they do not remove the need for diagnostics. If data are dependent over time, clustered by group, or generated by a changing process, the effective sample size may be much smaller than the raw count.

The standard error σ/n\sigma/\sqrt{n} is the practical bridge from the CLT to statistics. It quantifies the spread of sample means across repeated samples. If σ\sigma is unknown, statistics replaces it with an estimate, which leads to tt procedures under normal-sample assumptions. That inferential step is covered in the statistics section, but the probability source is the variance of the sample mean.

For proportions, the same logic gives standard error

p(1p)n.\sqrt{\frac{p(1-p)}{n}}.

This comes from treating each trial as Bernoulli.

As sample size grows, the standard error shrinks, but model bias does not automatically disappear. Averaging many biased measurements converges to the wrong center. Limit theorems control random error under assumptions; they do not fix bad measurement, sampling bias, or a misspecified target.

Visual

TheoremWhat converges?Type of convergenceMain message
Weak LLNXn\overline{X}_nprobabilityaverages get close to μ\mu
Strong LLNXn\overline{X}_nalmost surelylong-run average settles at μ\mu
CLTstandardized sum or meandistributionshape becomes normal
Normal approximationbinomial countsapproximationlarge count probabilities use ZZ

Worked example 1: LLN bound for coin flips

Problem. A fair coin is flipped n=1000n=1000 times. Let Xn\overline{X}_n be the sample proportion of heads, where Xi=1X_i=1 for heads and 00 for tails. Use Chebyshev's inequality to bound

P(Xn0.50.05).P(|\overline{X}_n-0.5|\ge 0.05).

Method.

  1. For one flip,
XiBernoulli(0.5).X_i\sim\operatorname{Bernoulli}(0.5).
  1. Mean and variance:
E[Xi]=0.5,E[X_i]=0.5, Var(Xi)=0.5(10.5)=0.25.\operatorname{Var}(X_i)=0.5(1-0.5)=0.25.
  1. The sample mean has variance
Var(Xn)=0.251000=0.00025.\operatorname{Var}(\overline{X}_n)=\frac{0.25}{1000}=0.00025.
  1. Chebyshev's inequality says
P(Xnμϵ)Var(Xn)ϵ2.P(|\overline{X}_n-\mu|\ge \epsilon) \le \frac{\operatorname{Var}(\overline{X}_n)}{\epsilon^2}.
  1. Substitute ϵ=0.05\epsilon=0.05:
P(Xn0.50.05)0.00025(0.05)2=0.000250.0025=0.10.\begin{aligned} P(|\overline{X}_n-0.5|\ge 0.05) &\le \frac{0.00025}{(0.05)^2}\\ &=\frac{0.00025}{0.0025}\\ &=0.10. \end{aligned}
  1. Interpretation. Chebyshev guarantees the probability is at most 10%10\%. The exact probability is smaller, but Chebyshev works broadly without requiring a binomial table.

Checked answer. The Chebyshev upper bound is 0.100.10.

Worked example 2: CLT approximation for a binomial count

Problem. Suppose XBinomial(200,0.40)X\sim\operatorname{Binomial}(200,0.40). Approximate P(70X90)P(70\le X\le 90) using the normal approximation with continuity correction.

Method.

  1. Compute mean:
μ=np=200(0.40)=80.\mu=np=200(0.40)=80.
  1. Compute variance and standard deviation:
σ2=np(1p)=200(0.40)(0.60)=48,\sigma^2=np(1-p)=200(0.40)(0.60)=48, σ=486.9282.\sigma=\sqrt{48}\approx 6.9282.
  1. Apply continuity correction:
P(70X90)P(69.5Y90.5),P(70\le X\le 90)\approx P(69.5\le Y\le 90.5),

where YN(80,48)Y\sim N(80,48).

  1. Standardize lower endpoint:
z1=69.5806.92821.5155.z_1=\frac{69.5-80}{6.9282}\approx -1.5155.
  1. Standardize upper endpoint:
z2=90.5806.92821.5155.z_2=\frac{90.5-80}{6.9282}\approx 1.5155.
  1. Use the standard normal CDF:
P(69.5Y90.5)=Φ(1.5155)Φ(1.5155).P(69.5\le Y\le 90.5)=\Phi(1.5155)-\Phi(-1.5155).
  1. With Φ(1.5155)0.9352\Phi(1.5155)\approx 0.9352 and Φ(1.5155)0.0648\Phi(-1.5155)\approx 0.0648,
P(70X90)0.8704.P(70\le X\le 90)\approx 0.8704.

Checked answer. The CLT approximation is about 0.8700.870.

Code

import numpy as np
from scipy.stats import binom, norm

# LLN simulation.
rng = np.random.default_rng(3)
n = 1000
reps = 20_000
samples = rng.binomial(1, 0.5, size=(reps, n))
means = samples.mean(axis=1)
sim_prob = np.mean(np.abs(means - 0.5) >= 0.05)
chebyshev_bound = 0.25 / (n * 0.05**2)
print("simulation:", sim_prob)
print("Chebyshev bound:", chebyshev_bound)

# CLT approximation for Binomial(200, 0.40).
n, p = 200, 0.40
mu = n * p
sigma = np.sqrt(n * p * (1 - p))
approx = norm.cdf((90.5 - mu) / sigma) - norm.cdf((69.5 - mu) / sigma)
exact = binom.cdf(90, n, p) - binom.cdf(69, n, p)
print("normal approximation:", approx)
print("exact binomial:", exact)

Common pitfalls

  • Thinking the LLN says short-run outcomes must balance out. It says averages stabilize as nn grows, not that tails "owe" heads.
  • Using the CLT for very small samples without checking the parent distribution or skew.
  • Forgetting to scale by n\sqrt{n} in the CLT.
  • Confusing convergence in probability with convergence in distribution.
  • Applying the normal approximation to a binomial when npnp or n(1p)n(1-p) is too small.
  • Forgetting the continuity correction for discrete-to-continuous approximations when accuracy matters.

Connections