Covariance vs. Correlation

**Explain both covariance and correlation formulaically, and compare and contrast them.**

This is the same question as problem #4 in the Statistics Chapter of Ace the Data Science Interview!

For any given random variables X and Y, the covariance, a linear measure of relationship, is defined by the following:

$Cov(X,Y) = E[(X-E[X])(Y-E[Y])] = E[XY] - E[X]E[Y]$

Specifically, covariance indicates the direction of the linear relationship between X and Y and can take on any potential value from negative infinity to infinity. The units of covariance are based on the units of X and Y, which may differ.

The correlation (Pearson correlation, not to be confused with Spearman rank correlation) between X and Y is the normalized version of covariance that takes into account the variances of X and Y:

$\rho(X, Y) = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}$

Since correlation results from scaling covariance, it is dimensionless (unlike covariance) and is always between -1 and 1 (also unlike covariance).