Back to questions

What are the assumptions underlying linear regression?

This is the same question as problem #1 in the Machine Learning Chapter of Ace the Data Science Interview!

The main assumptions underlying linear regression are the following:

a) Linearity: The relationship between the feature set $X$ and the target variable $Y$ is linear.

b) Homoscedasticity: The variance of the residuals is constant.

c) Independence: All observations are independent of one another.

d) Normality: The distribution of Y is assumed to be Normal.

With respect to independence and normality, use of the term "i.i.d." (independent and identically distributed) is common. If any of these assumptions are violated, any forecasts or confidence intervals based on the results of using the model will most likely be misleading or biased. The linear regression is likely to perform poorly out of sample as a result.