Moment Generating and Characteristic Functions
Generating functions encode a distribution inside an expectation involving powers or exponentials. They are compact tools for finding moments, proving sums of independent random variables, and proving limit theorems. Moment generating functions are often easiest when they exist near zero. Characteristic functions always exist and are the more general theoretical tool.

Figure: Pierre-Simon de Laplace is a key figure in probability, transforms, and potential theory. Image: Wikimedia Commons, Louis Delaistre after Armand-Charles Guilleminot, public domain.
This page is intentionally brief relative to a full probability-theory course, but it includes the core definitions and computations students need before seeing generating functions in proofs of the central limit theorem or in distribution derivations.
Definitions
For a random variable , the moment generating function (MGF) is
for values of where the expectation exists. If exists on an open interval around , it uniquely determines the distribution.
The characteristic function is
where . Characteristic functions exist for every random variable because .
For a nonnegative integer-valued random variable , the probability generating function (PGF) is
The MGF generates raw moments by differentiation:
when the derivatives exist.
The characteristic function similarly generates moments through derivatives at zero when the corresponding moments exist:
Key results
Sums of independent variables. If and are independent, then
where the MGFs exist. Proof:
The same multiplication rule holds for characteristic functions.
Recovering mean and variance.
MGF of common distributions.
| Distribution | MGF | Valid |
|---|---|---|
| Bernoulli | all real | |
| Binomial | all real | |
| Poisson | all real | |
| Exponential | ||
| Normal | all real |
Characteristic functions and convergence. A sequence of random variables converges in distribution if their characteristic functions converge pointwise to a characteristic function, under standard continuity conditions. This is one route to proving the central limit theorem.
MGFs are especially convenient for sums because multiplication is simpler than convolution. Instead of deriving the full distribution of by repeatedly summing or integrating over all possible decompositions, one can multiply the MGFs and then recognize the result. This is how the binomial MGF follows immediately from independent Bernoulli variables, and how sums of independent normal variables can be shown to remain normal.
Characteristic functions play the same role but with fewer existence problems. A Cauchy random variable, for example, has no finite mean and no MGF around zero, but it still has a characteristic function. This makes characteristic functions the standard tool in more advanced probability.
There is also a related object called the cumulant generating function:
When it exists, derivatives of at zero give cumulants. The first cumulant is the mean, the second is the variance, and higher cumulants encode skewness and tail behavior. Cumulants are useful because independent sums add cumulant generating functions:
This additivity is one reason generating functions appear naturally in asymptotic approximations.
PGFs are particularly useful for count variables because probabilities can be recovered from coefficients. If
then is the coefficient of . Derivatives at give factorial moments:
and
These are convenient for branching processes, occupancy counts, and sums of independent nonnegative integer-valued random variables. As with MGFs, independence turns sums into products:
Generating functions also provide quick distribution checks. If a derived MGF matches the known MGF of a named family on an interval around zero, then the derived random variable has that distribution. For example, the product of two normal MGFs is another normal MGF, with means and variances added. This avoids doing a convolution integral.
Still, a generating function is not a substitute for assumptions. The multiplication rule encodes independence, and recognition of a named MGF depends on using the same parameterization as the reference formula.
In practice, generating functions are often used backward. One computes a transform, simplifies it algebraically, and then identifies it as the transform of a known distribution. The final step should always include the valid range of the transform variable, since that range is part of the MGF statement.
That range also helps detect algebra mistakes: an exponential MGF cannot be valid for all real .
Visual
| Function | Best for | Main limitation |
|---|---|---|
| MGF | moments and named distributions | may not exist away from |
| Characteristic function | theory and convergence | complex-valued |
| PGF | nonnegative integer counts | limited to count variables |
Worked example 1: Bernoulli and binomial MGFs
Problem. Find the MGF of a Bernoulli random variable, then use it to get the MGF, mean, and variance of a Binomial random variable.
Method.
-
Let . Then with probability and with probability .
-
Compute the MGF:
-
Let , where the are independent Bernoulli. Then .
-
Use the multiplication rule:
- Find the mean:
At ,
- The variance is easier from the independent sum:
Checked answer. The binomial MGF is , with mean and variance .
Worked example 2: MGF of an exponential distribution
Problem. Let with density for . Find the MGF and use it to compute and .
Method.
- Start from the definition:
- Combine exponents:
-
The integral converges only if , so .
-
Evaluate:
- Differentiate:
Thus
- Differentiate again:
Then
- Variance:
Checked answer. for , , and .
Code
import sympy as sp
t, p, n, lam = sp.symbols("t p n lam", positive=True)
# Bernoulli MGF and binomial MGF.
M_bern = 1 - p + p * sp.exp(t)
M_binom = M_bern ** n
print("Bernoulli MGF:", M_bern)
print("Binomial MGF:", M_binom)
# Exponential MGF and moments.
M_exp = lam / (lam - t)
mean_exp = sp.diff(M_exp, t).subs(t, 0)
second_exp = sp.diff(M_exp, t, 2).subs(t, 0)
var_exp = sp.simplify(second_exp - mean_exp**2)
print("Exponential mean:", mean_exp)
print("Exponential variance:", var_exp)
# Numeric characteristic function estimate for a normal sample.
import numpy as np
rng = np.random.default_rng(2)
x = rng.normal(loc=1.0, scale=2.0, size=100_000)
u = 0.3
phi_hat = np.mean(np.exp(1j * u * x))
phi_theory = np.exp(1j * 1.0 * u - (2.0**2) * u**2 / 2)
print(phi_hat, phi_theory)
Common pitfalls
- Assuming an MGF exists for every distribution. Characteristic functions always exist; MGFs may fail.
- Forgetting that the MGF must exist on an interval around to guarantee uniqueness by the usual theorem.
- Multiplying MGFs for sums without independence.
- Confusing raw moments with central moments .
- Dropping the domain restriction, such as for an exponential MGF.
- Treating complex characteristic functions as optional decoration. They are often the right tool for rigorous convergence results.