Back to questions
Given two vectors, represented as lists X and Y, return the Pearson Correlation Coefficient.
p.s. this is the same as question 9.8 in Ace the Data Science Interview.
p.p.s. AQR is a competitive hedge fund so they test their Quants & Data Science on both coding & math/stats skills, so they expect you to know the Pearson correlation coefficient formula.
But, if you don't know the formula, don't worry – just use the hints below to learn more about Pearson correlation 👇
Correlation is a statistical measure used to assess the degree of similarity between two sets of vectors in a multi-dimensional space.
Here's the formula for the Pearson correlation coefficient:
The numerator is the covariance of X and Y, and the denominator is the product of the standard deviation of X and the standard deviation of Y.
To find the covariance of vector X and Y, use this formula:
In this formula:
We'll first start by creating two helper functions for calculating mean and standard deviation.
Using these helper functions, here's the full solution:
Calculating both the mean and standard deviation is O(N) runtime (since both take a single sum across all N elements).
Therefore, the correlation runtime is O(N), since we calculate a few means and standard deviations, as well as iterate over all N elements once through.
The space complexity is O(N) to keep track of N elements in the correlation to be processed.
Python