Joint, Marginal, and Conditional Distributions
Most probability models involve more than one random variable. A student's study time and exam score, two components in a system, or the two coordinates of a random point are not separate stories; their relationship is often the point of the model. Joint distributions describe variables together. Marginal distributions describe one variable after ignoring the others. Conditional distributions describe one variable after another has been observed.
Figure: Probability trees make the conditioning structure in Bayes' theorem explicit. Image: Wikimedia Commons, Gnathan87, CC0 1.0.
This page generalizes conditional probability from events to random variables. It prepares for covariance, independence, transformations, and Markov chains, all of which depend on understanding what information is contained in a joint distribution.
Definitions
For discrete random variables and , the joint PMF is
The marginal PMFs are obtained by summing over the other variable:
If , the conditional PMF of given is
For continuous random variables, the joint PDF satisfies
The marginal PDFs are
If , the conditional density is
The joint CDF is
Key results
Factorization by conditioning.
For discrete variables,
For continuous variables,
This is the random-variable version of .
Independence. Random variables and are independent if, for all suitable ,
In the discrete case, this is equivalent to
for all . In the continuous case, it is equivalent to
where densities exist.
Expectation from a joint distribution. For discrete variables,
For continuous variables,
Law of total expectation.
Law of total variance.
These laws say that overall variation can be split into within-condition variation and between-condition variation.
The support of a joint distribution is often the most important part of the problem. If the support is rectangular, such as and , integration limits are usually independent. If the support is triangular, circular, or constrained by inequalities such as , the limits change with the variable being integrated. Many wrong marginal densities come from integrating over the right formula but the wrong region.
Conditional distributions can be ordinary distributions in their own right. After observing , the function or must still sum or integrate to over . This gives a useful check. If the conditional distribution does not normalize, the denominator or support is wrong.
Conditional expectation compresses a conditional distribution into a single function:
As changes, this value can trace a regression curve. In linear regression, the corresponding statistical model focuses on how the conditional mean of a response changes with predictors.
Marginalization can also hide structure. Two groups may have different conditional relationships between and , while the combined marginal relationship looks weaker, stronger, or even reversed. This is the probability mechanism behind Simpson's paradox. Whenever a joint distribution includes a meaningful grouping variable, compare conditional distributions as well as the aggregate marginal distribution.
For more than two variables, the same ideas scale by summing or integrating over the variables not currently of interest. A Bayesian network, for example, is a structured factorization of a large joint distribution into smaller conditional pieces.
Visual
| Operation | Discrete | Continuous |
|---|---|---|
| joint probability/density | ||
| marginalize | ||
| condition on | ||
| compute probability in region | sum over cells | double integral |
Worked example 1: joint table, marginals, and conditionals
Problem. The joint PMF of and is:
Find the marginal distributions, , and determine whether and are independent.
Method.
- Check total probability:
- Marginal distribution of :
- Marginal distribution of :
- Conditional probability:
- Check independence. If independent, then
But the table gives .
Checked answer. The marginals are , , , . Also , and are not independent.
Worked example 2: a continuous joint density
Problem. Let
on the triangular region , and otherwise. Find , , and .
Method.
-
Verify normalization. The triangular region has area , so the integral of constant density over it is .
-
For fixed , the possible values satisfy . Therefore
- For fixed , the possible values satisfy . Therefore
- Conditional density of given :
for .
-
Interpret. Given , the conditional density of is uniform on .
-
Check normalization:
Checked answer. , , and .
Code
import numpy as np
# Discrete joint table example.
joint = np.array([
[0.20, 0.10],
[0.30, 0.40],
])
px = joint.sum(axis=1)
py = joint.sum(axis=0)
conditional_x1_given_y1 = joint[1, 1] / py[1]
independent = np.allclose(joint, np.outer(px, py))
print("P_X:", px)
print("P_Y:", py)
print("P(X=1 | Y=1):", conditional_x1_given_y1)
print("independent:", independent)
# Monte Carlo check for triangular density f=2 on 0<y<x<1.
rng = np.random.default_rng(0)
n = 100_000
# Sample X from density 2x using inverse CDF X=sqrt(U).
x = np.sqrt(rng.random(n))
y = rng.random(n) * x
print("sample mean X:", x.mean())
print("sample mean Y:", y.mean())
Common pitfalls
- Confusing joint and marginal probabilities. A joint table cell is not the same as a row total.
- Dividing by the wrong marginal when computing conditionals.
- Assuming a joint density value is a probability. Probabilities require integrating over a region.
- Checking independence at only one cell and declaring success. Factorization must hold over the whole support.
- Forgetting support constraints when integrating. In triangular or curved regions, limits depend on the other variable.
- Treating zero covariance as independence. That implication holds only under special conditions, not generally.