R Programming/Linear Models

Standard linear model
In this section we present estimation functions for the standard linear model estimated by ordinary least squares (OLS). Heteroskedasticity and endogeneity are treated below. The main estimation function is lm.

Fake data simulations
We first generate a fake dataset such that there is no hetereoskedasticity, no endogeneity and no correlation between the error terms. Therefore the ordinary least square estimator is unbiased and efficient. We choose a model with two variables and take all the coefficients equal to one.

$$ y_i = 1 + x_{1,i} + x_{2,i} + u_i $$

Least squares estimation

 * The standard function to estimate a simple linear model is.
 * lsfit performs the least square procedure but the output is not formatted in fashionable way.
 * ols (Design) is another alternative.

We estimate the model using lm. We store the results in fit and print the result using summary which is the standard function.

There are some alternative to display the results.
 * display in the arm package is one of them.
 * coefplot (arm) graphs the estimated coefficients with confidence intervals. This is a good way to present the results.
 * <tt>mtable</tt> in the memisc package can display the results of a set of regressions in the same table.

<tt>fit</tt> is a list of objects. You can see the list of these objects by typing <tt>names(fit)</tt>. We can also apply functions to <tt>fit</tt>.

We can get the estimated coefficients using <tt>fit$coeff</tt> or <tt>coef(fit)</tt>.

<tt>se.coef</tt> (arm) returns the standard error of the estimated coefficients.

The vector of fitted values can be returned via <tt>fit$fitted</tt>, <tt>fitted(fit)</tt> or the <tt>predict</tt> function. The <tt>predict</tt> function also returns standard error and confidence intervals for predictions.

The vector of residuals:

The number of degrees of freedom :

Confidence intervals
We can get the confidence intervals using <tt>confint</tt> or <tt>conf.intervals</tt> in the alr3 package.

Tests
<tt>coeftest</tt> (lmtest) performs the Student t test and z test on coefficients.

<tt>linear.hypothesis</tt> (car) performs a finite sample F test on a linear hypothesis or an asymptotic Wald test using $$\Chi^2$$ statistics.

See also <tt>waldtest</tt> (lmtest) for nested models.

Analysis of variance
We can also make an analysis of variance using <tt>anova</tt>.

Model Search and information criteria
The stats4 package includes <tt>AIC</tt> and <tt>BIC</tt> function:

The <tt>step</tt> functions performs a model search using the Akaike Information Criteria.

Zelig

 * The method is also supported in Zelig

Bayesian estimation

 * <tt>MCMCregress</tt> (MCMCpack)
 * <tt>BLR</tt> (BLR)

Heteroskedasticity

 * See the lmtest and sandwich packages.
 * <tt>gls</tt> (nlme) computes the generalized least squares estimator.
 * See "Cluster-robust standard errors using R" (pdf) by Mahmood Arai. He suggests two functions for cluster robust standard errors. <tt>clx</tt> allow for one-way clustering and <tt>mclx</tt> for two-way clustering. They can be loaded with the following command <tt>source("http://people.su.se/~ma/clmclx.R")</tt>.

Robustness
Cook's distance

Influence plot:

Leverage plots:

Bonferroni's outlier test: See also <tt>outlier.t.test</tt> in the alr3 package.


 * <tt>inf.index</tt> in the alr3 package computes all the robustness statistics (Cook's distance, studentized residuals, outlier test, etc)
 * <tt>rlm</tt> performs a robust estimation


 * See UCLA example
 * See also the robustbase package

Instrumental Variables

 * <tt>ivreg</tt> in the AER package
 * <tt>tsls</tt> in the sem package.
 * It is also possible to use the <tt>gmm</tt> command in the gmm package. See Methods of moments for an example.

Fake data simulations
We first simulate a fake data set with x correlated to u, z and u independent and x correlated with z. Thus x is an endogenous explanatory variable of y and z is a valid instrument for x.

Two stage least squares
Then we estimate the model with OLS (<tt>lm</tt>) and IV using z as an instrument for x.

We plot the results :

Panel Data
<tt>plm</tt> (plm) implements the standard random effect, fixed effect, first differences methods. It is similar to Stata's <tt>xtreg</tt> command.

Note that <tt>plm</tt> output are not compatible with <tt>xtable</tt> and <tt>mtable</tt> for publication quality output.


 * lme4 and gee implements random effect and multilevel models.
 * See also BayesPanel

Random effects model
To implement a random effects model we generate a fake data set with 1000 observations over 5 time periods.

We estimate the random effect model with the <tt>plm</tt> function and the <tt>model = "random"</tt> option.

Fixed effects model
For a fixed effects model we generate a fake dataset and we correlate the fixed effects f with covariates :

We first transform our data in a plm data frame using <tt>plm.data</tt>. We estimate the fixed model using <tt>plm</tt> with <tt>model = "within"</tt> as an option. Then, we compare the estimate with the random effect model and perform an Hausman test. At the end, we plot the density of the fixed effects.

Dynamic panel data

 * <tt>pgmm</tt> (plm) implements the Arellano Bond estimation procedure . It is similar to <tt>xtabond2</tt> in Stata.

Simultaneous equations model
For a [:w:Simultaneous_equations_model|simultaneous equations model] the following packages are needed :
 * sem package
 * systemfit package