Moment and Characteristic Functions
Moment generating functions and characteristic functions encode probability laws as functions. The central idea is that multiplying independent transforms is easier than convolving densities or mass functions. This is why transforms are powerful for sums of independent random variables and why MIT 18.440 uses them to prove the weak law of large numbers and the central limit theorem.

Figure: Pierre-Simon de Laplace is a key figure in probability, transforms, and potential theory. Image: Wikimedia Commons, Louis Delaistre after Armand-Charles Guilleminot, public domain.
Moment generating functions are intuitive because their derivatives generate moments, but they may fail to exist for heavy-tailed distributions. Characteristic functions insert the complex number and are always defined, making them more robust. They are Fourier transforms of probability distributions and are the standard tool behind convergence in distribution.
Definitions
The moment generating function of a random variable is
for values of where the expectation is finite.
If is discrete,
If has density ,
The characteristic function of is
where . Since , this expectation always exists.
We say converges in distribution to if
at every continuity point of .
Key results
Moment generating functions generate moments when derivatives exist near zero:
and more generally
Proof sketch: differentiate with respect to :
Then evaluate at . Justifying interchange of derivative and expectation requires regularity, which is why existence near zero matters.
If and are independent, then
because
The same identity holds for characteristic functions:
Scaling satisfies
Important examples:
| Distribution | MGF |
|---|---|
| Bernoulli | |
| Binomial | |
| Poisson | |
| Normal | |
| Exponential | for |
Levy's continuity theorem, in the form used in the lectures, says that convergence of characteristic functions to a characteristic function implies convergence in distribution. This makes characteristic functions a rigorous route to limit theorems.
Transforms are useful because they convert hard operations into easier ones. Convolution of densities becomes multiplication of transforms. Scaling a random variable becomes rescaling the transform argument. Moments become derivatives at zero. These rules allow one to prove distributional identities without performing difficult integrals directly.
The MGF may fail for two different reasons. It may be infinite for all nonzero , as with very heavy-tailed distributions, or it may exist only on one side of zero. For an exponential random variable with rate , exists only for . That restricted domain is still enough for many calculations, but one must not plug in arbitrary values.
Characteristic functions avoid this integrability problem because has absolute value . They can be complex-valued, but their real and imaginary parts are simply expectations of cosine and sine:
This bounded oscillatory structure is why characteristic functions are always defined and why they are closely related to Fourier analysis.
For integer-valued random variables, characteristic functions contain periodic information. Since whenever is an integer, . More generally, the pattern of reflects lattice structure in the distribution. This is one reason characteristic functions are more than a technical proof device.
In limit theorem proofs, the logarithm of a transform often exposes the first few moments. If and , then near zero the transform behaves like for MGFs or for characteristic functions. Raising this expression to the th power at argument produces an exponential limit, which is the normal transform.
Visual
| Feature | MGF | Characteristic function |
|---|---|---|
| Definition | ||
| Always exists | no | yes |
| Moments by derivatives | direct when finite | with powers of |
| Independent sums | products | products |
| Limit theorem use | useful when it exists near zero | more general |
| Heavy-tail behavior | may fail | still defined |
The contrast in the table is the practical reason for learning both transforms. MGFs are friendlier in elementary calculations because derivatives at zero have no complex constants, and common distributions have simple MGFs. Characteristic functions require complex notation, but they work for every probability distribution. The later lectures use this extra generality to prove limit theorems under hypotheses where MGFs might not exist.
Transform uniqueness is the background principle. Under the standard hypotheses used in probability, a distribution is determined by its characteristic function, and an MGF determines the distribution when it exists in a neighborhood of zero. Therefore, showing two random variables have the same transform is a legitimate way to show they have the same distribution. This is what happens when proving that independent Poisson sums remain Poisson.
One should still keep transforms connected to probability. A transform is not just an algebraic gadget; it is an expectation of a function of . Its value depends on the whole distribution, weighting every possible outcome by an exponential or oscillatory factor.
When using transforms, always record the interval of values where the calculation is valid. Two expressions that agree only outside the domain of an MGF do not prove anything about the distribution. Characteristic functions avoid this particular domain issue, but they require tracking complex arithmetic carefully. In proofs, this bookkeeping is part of the argument, not a cosmetic detail. It is also where many otherwise plausible transform solutions fail. Always state the transform and its domain together before comparing formulas.
Worked example 1: binomial MGF from Bernoulli trials
Problem: Find the MGF of a binomial random variable and use it to compute the mean.
Method:
- For a Bernoulli variable ,
- If , write
with independent Bernoulli variables.
- Therefore
- Differentiate:
- Evaluate at :
Checked answer: the transform method agrees with the indicator decomposition result .
Worked example 2: sum of independent Poisson variables
Problem: Use MGFs to show that if and are independent, then is Poisson with parameter .
Method:
- The Poisson MGF is
- For , independence gives
- Multiply:
- This is exactly the MGF of a Poisson random variable with parameter .
Checked answer: the result matches the Poisson process interpretation that independent event streams add their rates.
Code
from math import exp
def bernoulli_mgf(p, t):
return (1 - p) + p * exp(t)
def binomial_mgf(n, p, t):
return bernoulli_mgf(p, t) ** n
def poisson_mgf(lam, t):
return exp(lam * (exp(t) - 1))
p = 0.3
n = 10
h = 1e-5
numerical_mean = (binomial_mgf(n, p, h) - binomial_mgf(n, p, -h)) / (2 * h)
print("numerical binomial mean:", numerical_mean)
print("exact binomial mean:", n * p)
lam1, lam2, t = 2.0, 5.0, 0.4
product = poisson_mgf(lam1, t) * poisson_mgf(lam2, t)
combined = poisson_mgf(lam1 + lam2, t)
print("Poisson MGF product equals combined:", product, combined)
Common pitfalls
- Assuming an MGF exists for every distribution. Heavy-tailed laws such as Cauchy do not have finite MGFs near zero.
- Forgetting independence when multiplying transforms of sums.
- Confusing and . Characteristic functions use and may be complex-valued.
- Thinking equality of a few moments determines a distribution. A transform, when valid in the required sense, carries much more information.
- Applying a continuity theorem without checking that the limiting function is the transform of a probability law.