Correlation is a single number between \(-1\) and \(1\) that measures how strongly two variables move together in a straight-line way. In this post, we’ll see that it can be understood in three equivalent ways:
- Covariance of normalised variables – the expected product of their z-scores.
- Cosine of an angle – measuring alignment between two vectors.
- Regression view – the square root of the proportion of variation explained by a simple linear regression.
We’ll prove why correlation is always between \(-1\) and \(1\), when it reaches those extremes, and why shifting or scaling the variables doesn’t change it — all using just covariance and the Cauchy–Schwarz inequality. Along the way we’ll build both statistical and geometric intuition.
Before diving into formulas, it helps to picture what correlation means. Imagine a scatterplot of \(X\) versus \(Y\):
- Tight cloud along a positive slope → correlation near \(+1\).
- Tight cloud along a negative slope → correlation near \(-1\).
- Diffuse, round cloud with no slope → correlation near \(0\).
The tighter the points hug a straight line, the closer the correlation is to \(\pm 1\). The sign tells you whether the slope is up or down.
Formal definition of Correlation
Mathematically, the Pearson correlation is defined as
$$\rho_{X,Y} = \frac{\mathrm{Cov}(X, Y)}{\sigma_X , \sigma_Y}$$
where
$$\mathrm{Cov}(X, Y) = E\big[(X – \mu_X)(Y – \mu_Y)\big]$$
is the Covariance and \(\sigma_X, \sigma_Y\) are the standard deviations.
Corr as Cov of normalised variables
If we normalise both variables by subtracting their mean and dividing by their standard deviation:
$$\tilde{X} = \frac{X – \mu_X}{\sigma_X}, \quad \tilde{Y} = \frac{Y – \mu_Y}{\sigma_Y}$$
then we see that the correlation is just
$$\rho = Cov(\tilde{X} , \tilde{Y}) = E[\tilde{X} \cdot \tilde{Y}]$$
the expected product of their z-scores. From this, we get:
- Scale- and shift-invariance: If you add a constant to \(X\) or multiply it by a positive number, \(\rho\) with \(Y\) doesn’t change. Same for transformations of \(Y\)
Corr as Cosine of an angle
Now for a different viewpoint. Let \(V\) be the vector space of all random variables with finite mean and variance. Note here “vector” is used in the abstract sense as something that is an element of a vector space. If \(X\) and \(Y\) are mean-zero elements of \(V\), define an inner product:
$$\langle X, Y \rangle = E[XY]$$
The induced norm is:
$$X| = \sqrt{E[X^2]} = \sigma_X$$
This lets us write:
$$\rho = \frac{\langle X, Y \rangle}{|X| \cdot |Y|}$$
So correlation is the inner product between unit vectors \(X\) and \(Y\) in this space.
Why Correlation ≤ 1
In any inner-product space, the Cauchy–Schwarz inequality says:
$$|\langle X, Y \rangle| \le |X| , |Y|$$
Dividing through by \(|X| , |Y|\) gives:
$$ |\rho| = \frac{|\langle X, Y \rangle|}{|X| \cdot |Y|} \le 1 $$
Equality occurs if and only if \(X\) and \(Y\) are scalar multiples of each other. And so we have that:
- \(\rho = +1 \iff Y = aX\) for some \(a > 0\)
- \(\rho = -1 \iff Y = aX\) for some \(a < 0\)
From Cauchy–Schwarz to Cosine
Since \(-1 \le \rho \le 1\), there exists \(\theta\) such that
$$\cos \theta = \rho = \frac{\langle X, Y \rangle}{|X| \cdot |Y|}$$
Note the similarity of the formula above to the more familiar expression \( x \cdot y / (|x||y|) \) for vectors \(x,y \) in 2D or 3D. There, \(\theta\) would just be the between vectors; in higher dimensions we keep that in mind and extend it to be an abstract “alignment” measure:
- \(\theta = 0\) → perfectly aligned (\(\rho = 1\))
- \(\theta = \pi\) → perfectly opposed (\(\rho = -1\))
- \(\theta = \pi/2\) → orthogonal (\(\rho = 0\))
The projection analogy still works: correlation measures how much of one variable lies “in the direction” of the other after removing mean and scale.
Regression Perspective
Another route: consider the simple linear regression of \(Y\) on \(X\):
$$\hat{Y} = a + \beta X$$
The proportion of variance explained is:
$$R^2 := \frac{\text{variance explained}}{\text{total variance}} = \frac{Var(\hat{Y})}{Var(Y)}= \frac{\beta^2 Var(X)}{Var(Y)} $$
But we know (simple linear regression formula) that:
$$\beta = \frac{Cov(X,Y}{Var(X)}$$
So we get
$$R^2 = \frac{\frac{\mathrm{Cov}(X, Y)^2}{\mathrm{Var}(X)^2} \cdot \mathrm{Var}(X)}{\mathrm{Var}(Y)} = \frac{\mathrm{Cov}(X, Y)^2}{\mathrm{Var}(X) , \mathrm{Var}(Y) = \rho^2}$$
Thus, R² defined as the proportion of variance explained is exactly correlation squared.
Concrete Finite-Sample Version
If the vector space of all random variables with finite variance feels abstract, think in terms of samples:
Given data \((x_1, y_1), \dots, (x_n, y_n)\):
- Frist, mean-center the data: \(u = (x_1 – \bar{x}, \dots, x_n – \bar{x})\), \(v = (y_1 – \bar{y}, \dots, y_n – \bar{y})\). Note u and v live in the subspace of \(R^n\) of vectors whose entries sum to zero.
- Define an Inner product: \(\langle u, v \rangle = \frac{1}{n} \sum_{i=1}^n u_i v_i\) (the sample covariance)
- This induces a norm: \(|u| = \sqrt{\frac{1}{n} \sum_{i=1}^n u_i^2}\) (the sample standard deviation)
Then the sample correlation is:
$$\hat{\rho} = \frac{\langle u, v \rangle}{|u| \cdot |v|}$$
This is exactly the cosine of the angle between \(u\) and \(v\) in \(n\)-dimensional space.
Summary of Views
| Perspective | Formula | Interpretation |
|---|---|---|
| Statistical | \(E[\tilde{X} \tilde{Y}]\) | Expected product of z-scores |
| Geometric | \(\cos \theta\) | Alignment between vectors |
| Regression | \(\sqrt{R^2}\) | Strength of best one-variable linear fit |