Skip to main content

Covariance, Correlation, and Independence

When two random variables are studied together, we often want to know whether they move together, move oppositely, or have no systematic linear relationship. Covariance measures joint variation in original units; correlation standardizes it to a number between 1-1 and 11. Independence is stronger: it means the full distribution of one variable is unchanged by knowing the other.

Scatterplots show several patterns and Pearson correlation coefficients.

Figure: Correlation examples show how association strength and visual pattern are related but not identical. Image: Wikimedia Commons, DenisBoigelot and Imagecreator, public domain.

This distinction matters throughout statistics. A zero correlation can be useful, but it does not by itself mean that variables are unrelated. Many nonlinear relationships have zero covariance. Conversely, independence implies zero covariance when variances exist, but the reverse implication usually fails.

Definitions

For random variables XX and YY with finite means, the covariance is

Cov(X,Y)=E[(XE[X])(YE[Y])].\operatorname{Cov}(X,Y)=E[(X-E[X])(Y-E[Y])].

An equivalent computational formula is

Cov(X,Y)=E[XY]E[X]E[Y].\operatorname{Cov}(X,Y)=E[XY]-E[X]E[Y].

The correlation coefficient is

ρX,Y=Cov(X,Y)σXσY,\rho_{X,Y}=\frac{\operatorname{Cov}(X,Y)}{\sigma_X\sigma_Y},

where σX=Var(X)\sigma_X=\sqrt{\operatorname{Var}(X)} and σY=Var(Y)\sigma_Y=\sqrt{\operatorname{Var}(Y)}. Correlation is defined only when both standard deviations are positive and finite.

Random variables XX and YY are independent if events determined by XX are independent of events determined by YY. Equivalently,

P(XA,YB)=P(XA)P(YB)P(X\in A,Y\in B)=P(X\in A)P(Y\in B)

for all suitable sets AA and BB.

For discrete variables, independence is equivalent to

pX,Y(x,y)=pX(x)pY(y)p_{X,Y}(x,y)=p_X(x)p_Y(y)

for all values x,yx,y. For continuous variables with densities, it is equivalent to

fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y)=f_X(x)f_Y(y)

on the support.

Variables are uncorrelated if Cov(X,Y)=0\operatorname{Cov}(X,Y)=0. This is weaker than independence.

Key results

Independence implies zero covariance. If XX and YY are independent and E[X2]E[X^2] and E[Y2]E[Y^2] are finite, then

E[XY]=E[X]E[Y],E[XY]=E[X]E[Y],

so

Cov(X,Y)=0.\operatorname{Cov}(X,Y)=0.

Variance of a sum.

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y).\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)+2\operatorname{Cov}(X,Y).

For a sum of many variables,

Var(i=1nXi)=i=1nVar(Xi)+2i<jCov(Xi,Xj).\operatorname{Var}\left(\sum_{i=1}^n X_i\right) =\sum_{i=1}^n \operatorname{Var}(X_i) +2\sum_{i<j}\operatorname{Cov}(X_i,X_j).

If the variables are pairwise uncorrelated, the covariance terms vanish.

Correlation bounds.

1ρX,Y1.-1\le \rho_{X,Y}\le 1.

This follows from the Cauchy-Schwarz inequality:

Cov(X,Y)σXσY.|\operatorname{Cov}(X,Y)|\le \sigma_X\sigma_Y.

Perfect correlation. If ρX,Y=1\rho_{X,Y}=1 or ρX,Y=1\rho_{X,Y}=-1, then one variable is an exact positive or negative linear transformation of the other, except possibly on probability-zero events.

Units. Covariance has units equal to the product of the units of XX and YY. Correlation is unitless, which makes it easier to compare across settings.

Covariance can be read geometrically after centering. The product (XμX)(YμY)(X-\mu_X)(Y-\mu_Y) is positive when both variables are above their means or both are below their means. It is negative when one is above its mean and the other is below. Averaging these products gives covariance. A positive covariance therefore means same-direction deviations dominate; a negative covariance means opposite-direction deviations dominate.

Correlation standardizes this average product by the two standard deviations. This standardization makes ρ\rho insensitive to changes of units such as inches to centimeters or dollars to cents. However, correlation still measures only linear association. A strong curved pattern can have correlation close to zero, and a high correlation does not imply that changing one variable causes the other to change.

For independent variables, all information about XX is irrelevant for predicting any event involving YY. For uncorrelated variables, only the best linear prediction gains no slope from XX. That is a much narrower statement. In introductory statistics, this is why scatterplots matter alongside correlation coefficients.

Covariance matrices extend these ideas to many variables. The diagonal entries are variances, and the off-diagonal entries are covariances. Such matrices appear in multivariate normal distributions, principal component analysis, least squares, and error propagation. A valid covariance matrix must be symmetric and positive semidefinite, meaning no linear combination of the variables can have negative variance.

Correlation is also sensitive to mixtures. If data combine several subpopulations with different centers, the overall correlation can mainly reflect differences between groups rather than a relationship within each group. In probability terms, conditioning on a group variable changes the joint distribution. This is why covariance calculations should be interpreted together with the sampling process and any important conditioning variables.

When variables are binary, covariance has a direct event interpretation:

Cov(X,Y)=P(X=1,Y=1)P(X=1)P(Y=1).\operatorname{Cov}(X,Y)=P(X=1,Y=1)-P(X=1)P(Y=1).

It measures how much the joint occurrence exceeds what independence would predict. This binary form is a useful sanity check because it connects covariance directly back to event probability.

Visual

RelationshipMeaningImplies zero covariance?Implies independence?
independentfull joint factorsyes, if moments existyes
uncorrelatedno linear associationyesno
positive covarianceabove-mean values tend to pairnono
negative covarianceabove-mean pairs with below-meannono
zero correlationstandardized covariance is zeroyesno

Worked example 1: covariance from a joint table

Problem. Let the joint PMF be:

pX,Y(x,y)p_{X,Y}(x,y)y=0y=0y=1y=1
x=0x=00.200.200.100.10
x=1x=10.300.300.400.40

Compute Cov(X,Y)\operatorname{Cov}(X,Y) and ρX,Y\rho_{X,Y}.

Method.

  1. Marginals:
P(X=0)=0.30,P(X=1)=0.70,P(X=0)=0.30,\quad P(X=1)=0.70, P(Y=0)=0.50,P(Y=1)=0.50.P(Y=0)=0.50,\quad P(Y=1)=0.50.
  1. Means:
E[X]=0(0.30)+1(0.70)=0.70,E[X]=0(0.30)+1(0.70)=0.70, E[Y]=0(0.50)+1(0.50)=0.50.E[Y]=0(0.50)+1(0.50)=0.50.
  1. Since XY=1XY=1 only when X=1X=1 and Y=1Y=1,
E[XY]=1P(X=1,Y=1)=0.40.E[XY]=1\cdot P(X=1,Y=1)=0.40.
  1. Covariance:
Cov(X,Y)=E[XY]E[X]E[Y]=0.40(0.70)(0.50)=0.05.\begin{aligned} \operatorname{Cov}(X,Y) &=E[XY]-E[X]E[Y]\\ &=0.40-(0.70)(0.50)\\ &=0.05. \end{aligned}
  1. Variances:

    XX and YY are Bernoulli with parameters 0.700.70 and 0.500.50:

Var(X)=0.70(0.30)=0.21,\operatorname{Var}(X)=0.70(0.30)=0.21, Var(Y)=0.50(0.50)=0.25.\operatorname{Var}(Y)=0.50(0.50)=0.25.
  1. Correlation:
ρX,Y=0.050.210.250.050.2291=0.2182.\rho_{X,Y}=\frac{0.05}{\sqrt{0.21}\sqrt{0.25}} \approx \frac{0.05}{0.2291}=0.2182.

Checked answer. The covariance is 0.050.05 and the correlation is about 0.2180.218. The variables are positively associated but not strongly.

Worked example 2: zero covariance without independence

Problem. Let XX take values 1,0,1-1,0,1 with probabilities 1/31/3 each, and let Y=X2Y=X^2. Show that XX and YY are uncorrelated but not independent.

Method.

  1. Compute E[X]E[X]:
E[X]=(1)13+013+113=0.E[X]=(-1)\frac{1}{3}+0\cdot\frac{1}{3}+1\cdot\frac{1}{3}=0.
  1. Values of YY are 1,0,11,0,1, so
E[Y]=113+013+113=23.E[Y]=1\cdot\frac{1}{3}+0\cdot\frac{1}{3}+1\cdot\frac{1}{3}=\frac{2}{3}.
  1. Compute XY=X3XY=X^3. Values are 1,0,1-1,0,1, so
E[XY]=E[X3]=(1)13+0+113=0.E[XY]=E[X^3]=(-1)\frac{1}{3}+0+1\frac{1}{3}=0.
  1. Covariance:
Cov(X,Y)=E[XY]E[X]E[Y]=0023=0.\operatorname{Cov}(X,Y)=E[XY]-E[X]E[Y]=0-0\cdot\frac{2}{3}=0.
  1. Check independence. If XX and YY were independent, then
P(Y=0X=0)=P(Y=0).P(Y=0\mid X=0)=P(Y=0).

But Y=0Y=0 happens exactly when X=0X=0, so

P(Y=0X=0)=1,P(Y=0\mid X=0)=1,

while

P(Y=0)=13.P(Y=0)=\frac{1}{3}.

Checked answer. Cov(X,Y)=0\operatorname{Cov}(X,Y)=0, but XX and YY are not independent. The relationship is perfectly nonlinear: Y=X2Y=X^2.

Code

import numpy as np

# Joint table example.
joint = np.array([[0.20, 0.10], [0.30, 0.40]])
x_values = np.array([0, 1])
y_values = np.array([0, 1])

px = joint.sum(axis=1)
py = joint.sum(axis=0)
EX = np.sum(x_values * px)
EY = np.sum(y_values * py)
EXY = sum(x * y * joint[i, j]
for i, x in enumerate(x_values)
for j, y in enumerate(y_values))
cov = EXY - EX * EY
var_x = np.sum((x_values - EX)**2 * px)
var_y = np.sum((y_values - EY)**2 * py)
corr = cov / np.sqrt(var_x * var_y)
print(cov, corr)

# Zero covariance but dependence.
x = np.array([-1, 0, 1])
p = np.ones(3) / 3
y = x**2
cov_xy = np.sum(x * y * p) - np.sum(x * p) * np.sum(y * p)
print(cov_xy)

Common pitfalls

  • Saying "independent" when only correlation has been checked.
  • Assuming zero covariance means no relationship. It only rules out linear association.
  • Forgetting that covariance changes under unit scaling, while correlation does not.
  • Computing sample correlation from data and treating it as the population correlation without uncertainty.
  • Using covariance formulas without checking that second moments exist.
  • Ignoring nonlinear plots. A curved relationship can have near-zero correlation.

Connections