Joint Distributions, Transformations, and Independence
Many probability questions involve several random variables defined on the same sample space. A joint distribution records how they behave together, not merely how each behaves separately. This distinction matters: the marginal laws of and do not determine whether they are independent, positively related, negatively related, or constrained by an equation such as .
Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.
MIT 18.440 introduces joint mass functions, joint densities, marginal distributions, independent random variables, and distributions of functions of random variables. These ideas are the technical foundation for convolutions, conditional densities, covariance, order statistics, and the transform methods used later in the course.
Definitions
For discrete random variables and , the joint probability mass function is
The marginal mass functions are obtained by summing:
For continuous random variables, a joint density satisfies
The marginal densities are
Random variables and are independent if for all suitable sets ,
In the discrete case, this is equivalent to
for all . In the continuous case, it is equivalent to
where densities are defined.
Key results
If and is strictly increasing, then
If is differentiable with differentiable inverse, the density form is
For a non-one-to-one transformation, sum the contributions from each preimage:
For a two-dimensional differentiable one-to-one transformation with inverse ,
Even when two random variables have the same marginal distribution, their joint laws can be completely different. For example, if is uniform on , then has the same marginal as but is not independent of . If is separately sampled uniform on , then and can be independent.
The marginalization formulas are probability versions of "ignore one coordinate". In a joint table, summing a row forgets which value of occurred and keeps only the value of . In a joint density, integrating over performs the same operation continuously. This operation always loses information: many different joint laws can have the same marginals.
Independence is a factorization statement. In a finite table, every entry must equal row sum times column sum. Geometrically, the joint distribution has no interaction term; the probability assigned to a rectangle is the product of the two side probabilities. For continuous variables, this means the density surface separates into an -part and a -part.
Transformations require special care because density is tied to scale. If , intervals in correspond to intervals half as long in , so the density height changes by a factor of . The derivative factor in the change-of-variables formula is exactly this scale correction. In multiple dimensions, the absolute Jacobian determinant measures how areas or volumes are distorted.
The transformation method has two complementary approaches. The CDF method asks for and is often best when the event has a simple inequality description. The density method uses inverse branches and derivatives and is often faster when the transformation is monotone or piecewise monotone. Both methods should give the same answer when applied correctly.
Joint distributions are also the setting for conditional densities. If and have joint density , then the conditional density of given is formally
when . This formula is the continuous analogue of dividing a joint probability table entry by a column total. It prepares for conditional expectation and total variance.
Visual
Discrete joint law as a matrix
Y=1 Y=2 row sum pX
X=1 p11 p12 p1.
X=2 p21 p22 p2.
column p.1 p.2 total 1
sum pY
| Operation | Discrete version | Continuous version |
|---|---|---|
| Joint law | ||
| Marginal of | ||
| Independence | ||
| Probability of region | sum over grid points | double integral over region |
| Transformation | collect masses with same value | density times Jacobian |
The matrix picture is a useful diagnostic for independence. Once row and column sums are known, an independent joint table is forced: it must be the outer product of the marginal vectors. If the actual table differs from that outer product, the variables are dependent. For continuous variables, the same idea is harder to see visually, but a product density has rectangular probabilities that factor exactly.
For transformations, the table reminds us that discrete and continuous cases use different bookkeeping. A discrete transformation moves point masses and combines masses that land on the same value. A continuous transformation moves density through a change of scale. Forgetting this distinction leads to common errors such as assigning positive probability to a point in a continuous model or omitting a Jacobian factor.
Worked example 1: checking independence from a joint table
Problem: Let take values in with joint probabilities
Find the marginal distributions and decide whether and are independent.
Method:
- Sum rows for :
- Sum columns for :
- If independent, we would need
- But the table gives
Checked answer: and are not independent. The marginals alone do not reveal this; the joint table does.
Worked example 2: transforming to
Problem: Let be uniform on and define . Find the density of .
Method:
- The support of is .
- For , the equation has two solutions:
- The density of is on .
- For the positive branch, , so
- For the negative branch, , so
- Add both contributions:
Checked answer: integrate the density:
So it is a valid density.
Code
import numpy as np
joint = np.array([[0.30, 0.20],
[0.15, 0.35]])
px = joint.sum(axis=1)
py = joint.sum(axis=0)
independent_table = np.outer(px, py)
print("pX:", px)
print("pY:", py)
print("independent table would be:")
print(independent_table)
print("is independent?", np.allclose(joint, independent_table))
def density_y_square(y):
y = np.asarray(y)
out = np.zeros_like(y, dtype=float)
mask = (y > 0) & (y < 1)
out[mask] = 1 / (2 * np.sqrt(y[mask]))
return out
grid = np.linspace(0.001, 0.999, 1000)
approx_integral = np.trapz(density_y_square(grid), grid)
print("approx integral:", approx_integral)
Common pitfalls
- Assuming marginals determine the joint distribution. They do not.
- Checking independence at only one point in a joint table. Independence requires factorization everywhere.
- Forgetting the absolute derivative factor when transforming a density.
- Using a one-to-one change-of-variables formula for a many-to-one map like .
- Treating zero-probability conditioning events in continuous models as if the elementary discrete formula applies unchanged.