Strong Law and Jensen's Inequality
The strong law of large numbers upgrades the weak law from high-probability convergence to almost-sure convergence. Instead of saying that the probability of a large error goes to zero, it says that with probability one the sample averages eventually settle to the mean along the actual infinite sequence of trials. This is a stronger and more pathwise statement.

Figure: A central limit theorem simulation shows why sample means often become approximately normal. Image: Wikimedia Commons, Daniel Resende, CC BY-SA 4.0.
Jensen's inequality is a different kind of result: it compares the average value of a convex function to the function of an average. MIT 18.440 places Jensen after the strong law and uses economic examples to show why convexity and concavity matter. A risk with the same expected value can be preferable or worse depending on the shape of the utility or payoff function.
Definitions
Let be i.i.d. random variables with mean , and define
The strong law of large numbers states that, under suitable hypotheses such as finite mean,
This is also called almost sure convergence of to .
A function is convex if for ,
If is twice differentiable, convexity is implied by
for all in the interval. A function is concave if is convex.
Jensen's inequality says that for convex ,
whenever the expectations exist. For concave , the inequality reverses:
Key results
The strong law implies the weak law. Suppose almost surely. Fix . On almost every sample path, there is a last time after which . Define
with finite almost surely. Then
Thus almost-sure convergence forces convergence in probability.
One proof route for the strong law assumes a fourth moment. After centering so , one studies
Independence and centering kill many mixed terms, leaving a bound of order . Summing over a sparse subsequence and controlling gaps leads to almost-sure convergence. The full strong law under finite mean is deeper, but the lecture proof illustrates why higher moments can make pathwise convergence accessible.
Jensen's inequality can be proved geometrically. For a convex function , at the point there is a supporting line
such that for all . Taking expectations gives
For , Jensen gives
which is the nonnegativity of variance.
Almost sure convergence is a statement about entire infinite sequences. It says that the set of sample paths for which convergence fails has probability zero. This does not mean failure is logically impossible; rather, it is negligible under the probability model. In repeated coin tossing, one exceptional path is all heads forever, but that single path has probability zero.
The strong law is the rigorous version of the long-run frequency idea. If is the indicator of heads on toss , then is the fraction of heads in the first tosses. The strong law says that, with probability one, this fraction tends to . The weak law says only that for any fixed large , the probability of a noticeable deviation is small.
The lecture's fourth-moment proof strategy illustrates a common theme: stronger moment assumptions give stronger control over rare large deviations. Bounds on can be used with summability ideas to control infinitely many bad events. This is different from Chebyshev's inequality at one fixed , which by itself does not directly rule out infinitely many deviations.
Jensen's inequality is both geometric and probabilistic. A convex function rewards variability because the chord between two points lies above the graph. Thus randomizing around a fixed mean increases the expected value of a convex payoff. A concave utility function does the opposite: it penalizes variability, which is why risk-averse decision makers can prefer a certain payoff to a risky payoff with the same monetary expectation.
The hedge-fund-style example in the lecture uses a convex compensation function. If a manager receives a large upside share but limited downside penalty, the manager's expected compensation can increase with risk even when the investor's expected return does not. Jensen's inequality is the mathematical reason this principal-agent tension appears: convex payoffs make variability valuable to the payoff holder.
Visual
| Idea | Statement | Type of conclusion |
|---|---|---|
| Weak law | high-probability convergence | |
| Strong law | pathwise convergence | |
| Convex Jensen | variability raises convex payoffs | |
| Concave Jensen | variability lowers concave utility |
The table puts two different uses of averaging side by side. The strong law studies what happens when many independent observations are averaged. Jensen studies what happens when a function is applied before or after averaging. In one case the issue is convergence of data; in the other, it is the effect of nonlinear transformation. Both are central because probability repeatedly alternates between averaging random quantities and transforming them.
For convex , the gap
can be interpreted as the value of variability under that convex payoff. For concave , the sign reverses and variability is costly. This gives a mathematical vocabulary for risk preference without leaving the probability framework.
Both results also warn against overinterpreting averages. A long-run average can stabilize almost surely, while a nonlinear payoff of each observation may still favor or penalize variability through Jensen's inequality. The order of averaging and transforming matters.
Worked example 1: strong law implies weak law in a concrete event
Problem: Suppose the strong law holds for sample averages . Show that for ,
Method:
- The strong law says that with probability one,
- On any sample path where this convergence occurs, there is some random index such that for all ,
- Let
For convergent sample paths, is finite.
- If , then .
- Therefore
- Since is finite almost surely,
Checked answer: the desired probability tends to zero, exactly the weak-law statement for this fixed tolerance.
Worked example 2: Jensen and a risky payoff
Problem: An investment returns with probability and with probability . Compare with .
Method:
- The mean return is
- The square-root function is concave on .
- Compute expected utility:
- Compute utility of the mean:
- Jensen for concave functions predicts
Checked answer: , so a decision maker with square-root utility prefers the certain payoff to the risky payoff with the same expected monetary value.
Code
import numpy as np
rng = np.random.default_rng(1)
trials = rng.integers(0, 2, size=100_000) # Bernoulli(1/2)
averages = np.cumsum(trials) / np.arange(1, len(trials) + 1)
print("last sample average:", averages[-1])
print("max error after 1000:", np.max(np.abs(averages[1000:] - 0.5)))
values = np.array([0.0, 100.0])
probs = np.array([0.5, 0.5])
expected_x = np.dot(probs, values)
expected_sqrt = np.dot(probs, np.sqrt(values))
sqrt_expected = np.sqrt(expected_x)
print("E[sqrt(X)]:", expected_sqrt)
print("sqrt(E[X]):", sqrt_expected)
Common pitfalls
- Saying the weak law and strong law are the same because both involve averages. Almost-sure convergence is stronger than convergence in probability.
- Interpreting "with probability one" as "for every possible outcome". Probability-zero exceptional paths may exist.
- Applying Jensen in the wrong direction. Convex functions put the expectation above the function at the mean; concave functions reverse this.
- Forgetting integrability assumptions. Jensen requires the relevant expectations to be defined.
- Treating higher expected payoff as automatically better when utility is concave or when payoff functions are nonlinear.