Covariance vs. Correlation
Explain both covariance and correlation formulaically, and compare and contrast them.
This is the same question as problem #4 in the Statistics Chapter of Ace the Data Science Interview!
For any given random variables X and Y, the covariance, a linear measure of relationship, is defined by the following:
Specifically, covariance indicates the direction of the linear relationship between X and Y and can take on any potential value from negative infinity to infinity. The units of covariance are based on the units of X and Y, which may differ.
The correlation (Pearson correlation, not to be confused with Spearman rank correlation) between X and Y is the normalized version of covariance that takes into account the variances of X and Y:
Since correlation results from scaling covariance, it is dimensionless (unlike covariance) and is always between -1 and 1 (also unlike covariance).