Weak Law, Concentration, and the Central Limit Theorem
Limit theorems explain why probability becomes predictable at large scale. The weak law of large numbers says sample averages concentrate near the mean. The central limit theorem says the remaining fluctuations, after scaling by , often look normal. These two statements answer different questions: the weak law gives convergence to a constant, while the central limit theorem describes the shape of the error.

Figure: A central limit theorem simulation shows why sample means often become approximately normal. Image: Wikimedia Commons, Daniel Resende, CC BY-SA 4.0.
MIT 18.440 proves the weak law first through Markov and Chebyshev inequalities, then revisits it with characteristic functions. The central limit theorem is then proved with transform methods. The lecture sequence makes clear why moment hypotheses matter: finite variance gives a short Chebyshev proof of the weak law, while characteristic functions allow more general convergence arguments.
Definitions
For a nonnegative random variable and , Markov's inequality is
For a random variable with mean and variance , Chebyshev's inequality is
Let be independent identically distributed random variables with mean , and define the sample average
The weak law of large numbers states that for every ,
If the have variance , the normalized sum is
The central limit theorem states that
where is standard normal and denotes convergence in distribution.
Key results
Markov's inequality proof: since ,
Dividing by gives the result.
Chebyshev follows by applying Markov to the nonnegative random variable :
Weak law with finite variance: if the are i.i.d. with variance , then
Chebyshev gives
The transform proof of the central limit theorem begins by standardizing so and . If exists near zero, then
For
independence gives
Using the expansion,
the MGF of a standard normal variable. Characteristic functions give the same argument under the finite-variance assumption without requiring the MGF to exist.
The weak law and the central limit theorem should be compared through the sample average:
The CLT says has an approximately stable normal distribution for large . Multiplying by then says the actual average error is typically of order . The weak law records only the consequence that this error goes to zero in probability, while the CLT describes the scale and shape of the error.
Markov and Chebyshev inequalities are deliberately crude. They do not assume a particular distribution, so they cannot give sharp normal-tail estimates in general. Their strength is robustness: with only a mean or variance, they still produce valid bounds. In applications, a loose guaranteed bound can be more valuable than a sharper approximation that depends on unverified distributional assumptions.
The finite-variance proof of the weak law also shows why independence matters. If are not independent and remain highly correlated, the variance of the average may fail to shrink like . For example, if all are actually the same random variable, then for every and averaging does not reduce uncertainty.
The CLT is robust but not universal. Heavy-tailed variables without finite variance can have different stable limits, and variables without finite mean may not satisfy the usual law of large numbers. The Cauchy distribution is the standard warning. Its sample average does not concentrate around a mean because no finite mean exists.
The De Moivre-Laplace theorem for coin tosses is the binomial special case of the CLT. If , then
is approximately standard normal for large , provided is not too close to or . This is the origin of the familiar bell curve in repeated-trial counting problems.
Visual
| Result | Scale | Conclusion | Main tool |
|---|---|---|---|
| Markov inequality | one variable | large nonnegative values are limited by mean | expectation |
| Chebyshev inequality | one variable | deviations are limited by variance | Markov on square |
| Weak law | average converges in probability to | Chebyshev or characteristic functions | |
| CLT | normalized error tends to normal | MGFs or characteristic functions |
The table separates inequalities from asymptotic theorems. Markov and Chebyshev are finite- statements: they are true for a single random variable or a single sample average. The weak law and CLT are limiting statements about sequences. In practice, the finite inequalities are often used to prove limits, while the limits explain the behavior of large systems.
The CLT row also clarifies why the normal distribution appears even when the original variables are not normal. The theorem is about normalized sums, not the raw summands. Bernoulli, uniform, and many other finite-variance distributions produce approximately normal centered sums after enough independent addition.
The hypotheses should always be kept visible. Independence, identical distribution, finite mean, and finite variance each play a role in the standard statements. Changing those assumptions can lead to different limits or no useful limit at all. This is why heavy-tailed examples are not side issues; they mark the boundary of the theorems and prevent overgeneralization. They explain exactly what the theorem is buying.
Worked example 1: Chebyshev bound for a sample mean
Problem: Suppose are i.i.d. with mean and variance . Bound the probability that their average differs from by at least .
Method:
- Let
- The mean is
- Since the variables are independent,
- Apply Chebyshev with :
Checked answer: Chebyshev guarantees the probability is at most . The true probability may be much smaller, but this bound uses only the mean, variance, and independence.
Worked example 2: normal approximation for coin tosses
Problem: Toss a fair coin times. Approximate the probability of seeing between and heads, inclusive.
Method:
- Let .
- The mean and variance are
so .
- Use the normal approximation with continuity correction:
where is normal with mean and standard deviation .
- Standardize endpoints:
- Therefore
- Using and ,
Checked answer: the event is within about standard deviations of the mean, so a probability around is plausible.
Code
from math import erf, sqrt, comb
def normal_cdf(z):
return 0.5 * (1 + erf(z / sqrt(2)))
chebyshev_bound = 9 / (100 * 1 ** 2)
print("Chebyshev bound:", chebyshev_bound)
approx = normal_cdf(1.05) - normal_cdf(-1.05)
print("CLT approximation:", approx)
# Exact binomial probability for comparison.
n = 400
exact = sum(comb(n, k) for k in range(190, 211)) / (2 ** n)
print("Exact binomial probability:", exact)
Common pitfalls
- Confusing convergence in probability with almost sure convergence. The weak law is weaker than the strong law.
- Forgetting the scaling in the central limit theorem.
- Applying Chebyshev without a finite variance.
- Treating the CLT as saying the average itself has a nondegenerate normal limit. The average converges to a constant; the scaled error is asymptotically normal.
- Ignoring continuity correction when approximating a discrete binomial probability by a continuous normal distribution.