Econometric Theory/Assumptions of Classical Linear Regression Model

The estimators that we create through linear regression give us a relationship between the variables. However, performing a regression does not automatically give us a reliable relationship between the variables. In order to create reliable relationships, we must know the properties of the estimators $$ \hat{\alpha}, \hat{\beta} $$ and show that some basic assumptions about the data are true. One must understand that having a good dataset is of enormous importance for applied economic research.

=Unbiasedness= Under the following four assumptions, OLS is unbiased. This means that:

$$E(\hat \alpha) = \alpha$$

$$E(\hat \beta) = \beta$$.

Linearity
The model must be linear in the parameters.

The parameters are the coefficients on the independent variables, like $$\alpha$$ and $$\beta$$. These should be linear, so having $$\beta^2$$ or $$e^\beta$$ would violate this assumption.

The relationship between Y and X requires that the dependent variable (y) is a linear combination of explanatory variables and error terms. This assumption require that the model is complete (model specification) in the sense that all relevant variables has been included in the model. The model have to be linear in parameters, but it does not require the model to be linear in variables. Equation 1 and 2 depict a model which is both, linear in parameter and variables. Note that Equation 1 and 2 show the same model in different notation. (1) (2)  In order for OLS to work the specified model has to be linear in parameters. Note that if the true relationship between and  is non linear it is not possible to estimate the coefficient  in any meaningful way. Equation 3 shows an empirical model in which is of quadratic nature. (3) Assumption 1 of CLRM requires the model to be linear in parameters. OLS is not able to estimate Equation 3 in any meaningful way. However, assumption 1 does not require the model to be linear in variables. OLS will produce a meaningful estimation of in Equation 4. (4) Using the method of ordinary least squares (OLS) allows us to estimate models which are linear in parameters, even if the model is non linear in variables. On the contrary it is not possible to estimate models which are non linear in parameters, even if they are linear in variables. Finally, every model estimated with OLS should contains all relevant explanatory variables and all included explanatory variables should be relevant.

Sample Variation
The $$x_i$$s cannot all have the same value. This is perfect multicollinearity, it is not allowed.

Random Sampling
The $$x_i$$ values must be randomly selected. In other words, there is no correlation between two different x values: $$Cov(x_i, x_j) = 0$$ for $$i \neq j$$.

Zero Conditional Mean
The mean of the error terms, given a specific value of the independent variable $$x_i$$, is zero. $$E(\epsilon_i | X_i) = 0$$.

=Efficiency of OLS (Ordinary Least Squares) = Given the following two assumptions, OLS is the Best Linear Unbiased Estimator (BLUE). This means that out of all possible linear unbiased estimators, OLS gives the most precise estimates of $$\alpha$$ and $$\beta$$.

With the third assumption, OLS is the Best Unbiased Estimator (BUE), so it even beats non-linear estimators. Also given this assumption, $$\hat \alpha$$ is distributed according to the Student's t-distribution about $$\alpha$$, and $$\hat \beta$$ is distributed in such a way about $$\beta$$.

No Heteroskedasticity
The variance of the Error terms are constant. $$ Var(u_i|x_i) = \sigma^2$$. This means that the variance of the error term $$u_i$$ does not depend on the value of $$x_i$$. If this is the case, the error terms are called homoskedastic. This is not always the case in economic data, for example the variation in a person's wage will vary with their level of education—someone who is a high-school dropout will not have much variation in their wage, where people with Ph.D.s may see very different wages.

No Serial Correlation
The error terms are independently distributed so that their covariance is 0. $$Cov(u_i, u_j | x_i, x_j) = 0 \forall i \ne j$$.

Normally Distributed Errors
The error terms are normally distributed. $$u_ ~ N(0,\sigma^2)$$