Functions of Random Variables
Many useful random variables are built from other random variables. If is a measurement, then , , , , and are all transformations. Probability theory gives systematic ways to derive the distribution of the transformed variable rather than guessing from simulation.
Figure: A Galton box turns repeated random left-right choices into an approximate bell-shaped distribution. Image: Wikimedia Commons, Marcin Floryan, CC BY-SA 3.0.
Transformations are also the engine behind standardization, change of variables in continuous distributions, sums of independent variables, and many sampling distributions. The main tools are the CDF method, one-to-one density transformations, Jacobians, and convolution.
Definitions
If , then is a function of a random variable. Its distribution is determined by
This is called the CDF method. It is often the safest method because it works even when is not one-to-one.
If is continuous with density and where is differentiable and one-to-one with inverse , then
For a vector transformation with inverse , the joint density is
where is the determinant of the Jacobian matrix of the inverse transformation.
If and are independent continuous random variables, the density of their sum is the convolution
For discrete variables,
Key results
Linear transformations. If and , then
For densities,
Means and variances transform as
Monotone transformations. If is strictly increasing, then
If is strictly decreasing, inequalities reverse.
Order statistics. If are independent with CDF , then the maximum has CDF
The minimum satisfies
A practical transformation workflow is:
- Find the support of the new variable before doing algebra.
- Decide whether the transformation is one-to-one on that support.
- If it is one-to-one, use the inverse and derivative formula.
- If it is not one-to-one, split the support into one-to-one branches or use the CDF method.
- Check that the resulting density integrates to .
For example, is not one-to-one on , but it is one-to-one on and . The transformed density receives contributions from both branches. The CDF method automatically accounts for both branches, which is why it is often safer.
In multivariable transformations, the Jacobian factor measures local area distortion. A transformation may stretch a small rectangle in -space into a larger or smaller region in -space. The absolute determinant corrects the density so that probability mass is preserved. The sign of the determinant is irrelevant for probability, which is why the absolute value is used.
For sums, convolution is a distribution-level version of adding all possible ways to reach the same total. In the continuous case, the integral sweeps over every possible value of the first variable and pairs it with for the second variable.
Transformations are also how simulation turns simple random numbers into useful samples. Many pseudorandom generators first produce values that are approximately Uniform. If is a continuous CDF and , then
has CDF . This is the inverse-transform method. It works especially well when the inverse CDF is available in closed form or can be computed numerically. For example, if , then is Exponential.
Another common transformation is standardization. If has mean and standard deviation , then
has mean and variance . Standardization does not usually make a variable normal; it only changes location and scale. It becomes a standard normal variable only when the original was normal.
Absolute values, squares, maxima, and ratios deserve special care because they can merge many original outcomes into the same transformed value. Whenever a transformation collapses information, expect multiple inverse branches or support boundaries. A quick sketch of the function often prevents algebraic mistakes.
For ratios, also check where the denominator can be zero or close to zero, since this often creates heavy tails and may destroy moments.
Visual
| Task | Tool | Warning |
|---|---|---|
| linear density formula | divide by | |
| monotone | inverse transformation | support changes |
| CDF method or split branches | two preimages for | |
| independent | convolution | independence required |
| Jacobian | use inverse Jacobian |
Worked example 1: squaring a uniform variable
Problem. Let and let . Find the CDF and PDF of .
Method.
-
The support of is .
-
For ,
- Rewrite the event:
- Since is uniform on an interval of length ,
- Therefore
- Differentiate on :
Checked answer. has density on . The density is high near zero because many values near zero produce very small squares.
Worked example 2: sum of two independent uniforms
Problem. Let be independent Uniform random variables. Find the density of .
Method.
- Use convolution:
- Since both densities equal on , the integrand is when
and
- The second inequality means
- Therefore must lie in the overlap
- For , the overlap is , length , so
- For , the overlap is , length , so
- Outside , there is no overlap.
Checked answer.
The density is triangular and integrates to .
Code
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(1)
n = 200_000
# Example 1: Y = X^2 for X uniform(-1, 1).
x = rng.uniform(-1, 1, size=n)
y = x**2
print("P(Y <= 0.25) simulation:", np.mean(y <= 0.25))
print("P(Y <= 0.25) theory:", np.sqrt(0.25))
# Example 2: sum of two uniforms.
u = rng.uniform(0, 1, size=n)
v = rng.uniform(0, 1, size=n)
s = u + v
print("mean of sum:", s.mean())
print("P(0.5 <= S <= 1.5):", np.mean((s >= 0.5) & (s <= 1.5)))
# Optional plot when running locally.
plt.hist(s, bins=80, density=True, alpha=0.5)
grid = np.linspace(0, 2, 200)
density = np.where(grid < 1, grid, 2 - grid)
plt.plot(grid, density, color="black")
plt.show()
Common pitfalls
- Forgetting the derivative factor in one-to-one transformations.
- Using the inverse transformation formula when the function is not one-to-one without splitting branches.
- Losing support restrictions after a transformation.
- Treating convolution as valid without independence.
- Confusing the Jacobian of the forward transformation with the Jacobian of the inverse transformation.
- Forgetting endpoint behavior for decreasing transformations and CDF calculations.