Covariance, Correlation, and Independence
When two random variables are studied together, we often want to know whether they move together, move oppositely, or have no systematic linear relationship. Covariance measures joint variation in original units; correlation standardizes it to a number between and . Independence is stronger: it means the full distribution of one variable is unchanged by knowing the other.
Figure: Correlation examples show how association strength and visual pattern are related but not identical. Image: Wikimedia Commons, DenisBoigelot and Imagecreator, public domain.
This distinction matters throughout statistics. A zero correlation can be useful, but it does not by itself mean that variables are unrelated. Many nonlinear relationships have zero covariance. Conversely, independence implies zero covariance when variances exist, but the reverse implication usually fails.
Definitions
For random variables and with finite means, the covariance is
An equivalent computational formula is
The correlation coefficient is
where and . Correlation is defined only when both standard deviations are positive and finite.
Random variables and are independent if events determined by are independent of events determined by . Equivalently,
for all suitable sets and .
For discrete variables, independence is equivalent to
for all values . For continuous variables with densities, it is equivalent to
on the support.
Variables are uncorrelated if . This is weaker than independence.
Key results
Independence implies zero covariance. If and are independent and and are finite, then
so
Variance of a sum.
For a sum of many variables,
If the variables are pairwise uncorrelated, the covariance terms vanish.
Correlation bounds.
This follows from the Cauchy-Schwarz inequality:
Perfect correlation. If or , then one variable is an exact positive or negative linear transformation of the other, except possibly on probability-zero events.
Units. Covariance has units equal to the product of the units of and . Correlation is unitless, which makes it easier to compare across settings.
Covariance can be read geometrically after centering. The product is positive when both variables are above their means or both are below their means. It is negative when one is above its mean and the other is below. Averaging these products gives covariance. A positive covariance therefore means same-direction deviations dominate; a negative covariance means opposite-direction deviations dominate.
Correlation standardizes this average product by the two standard deviations. This standardization makes insensitive to changes of units such as inches to centimeters or dollars to cents. However, correlation still measures only linear association. A strong curved pattern can have correlation close to zero, and a high correlation does not imply that changing one variable causes the other to change.
For independent variables, all information about is irrelevant for predicting any event involving . For uncorrelated variables, only the best linear prediction gains no slope from . That is a much narrower statement. In introductory statistics, this is why scatterplots matter alongside correlation coefficients.
Covariance matrices extend these ideas to many variables. The diagonal entries are variances, and the off-diagonal entries are covariances. Such matrices appear in multivariate normal distributions, principal component analysis, least squares, and error propagation. A valid covariance matrix must be symmetric and positive semidefinite, meaning no linear combination of the variables can have negative variance.
Correlation is also sensitive to mixtures. If data combine several subpopulations with different centers, the overall correlation can mainly reflect differences between groups rather than a relationship within each group. In probability terms, conditioning on a group variable changes the joint distribution. This is why covariance calculations should be interpreted together with the sampling process and any important conditioning variables.
When variables are binary, covariance has a direct event interpretation:
It measures how much the joint occurrence exceeds what independence would predict. This binary form is a useful sanity check because it connects covariance directly back to event probability.
Visual
| Relationship | Meaning | Implies zero covariance? | Implies independence? |
|---|---|---|---|
| independent | full joint factors | yes, if moments exist | yes |
| uncorrelated | no linear association | yes | no |
| positive covariance | above-mean values tend to pair | no | no |
| negative covariance | above-mean pairs with below-mean | no | no |
| zero correlation | standardized covariance is zero | yes | no |
Worked example 1: covariance from a joint table
Problem. Let the joint PMF be:
Compute and .
Method.
- Marginals:
- Means:
- Since only when and ,
- Covariance:
-
Variances:
and are Bernoulli with parameters and :
- Correlation:
Checked answer. The covariance is and the correlation is about . The variables are positively associated but not strongly.
Worked example 2: zero covariance without independence
Problem. Let take values with probabilities each, and let . Show that and are uncorrelated but not independent.
Method.
- Compute :
- Values of are , so
- Compute . Values are , so
- Covariance:
- Check independence. If and were independent, then
But happens exactly when , so
while
Checked answer. , but and are not independent. The relationship is perfectly nonlinear: .
Code
import numpy as np
# Joint table example.
joint = np.array([[0.20, 0.10], [0.30, 0.40]])
x_values = np.array([0, 1])
y_values = np.array([0, 1])
px = joint.sum(axis=1)
py = joint.sum(axis=0)
EX = np.sum(x_values * px)
EY = np.sum(y_values * py)
EXY = sum(x * y * joint[i, j]
for i, x in enumerate(x_values)
for j, y in enumerate(y_values))
cov = EXY - EX * EY
var_x = np.sum((x_values - EX)**2 * px)
var_y = np.sum((y_values - EY)**2 * py)
corr = cov / np.sqrt(var_x * var_y)
print(cov, corr)
# Zero covariance but dependence.
x = np.array([-1, 0, 1])
p = np.ones(3) / 3
y = x**2
cov_xy = np.sum(x * y * p) - np.sum(x * p) * np.sum(y * p)
print(cov_xy)
Common pitfalls
- Saying "independent" when only correlation has been checked.
- Assuming zero covariance means no relationship. It only rules out linear association.
- Forgetting that covariance changes under unit scaling, while correlation does not.
- Computing sample correlation from data and treating it as the population correlation without uncertainty.
- Using covariance formulas without checking that second moments exist.
- Ignoring nonlinear plots. A curved relationship can have near-zero correlation.