Stata/Linear Models

Simple Linear Model
We generate a simple fake data set : clear set obs 1000 gen u = invnorm(uniform) gen x = invnorm(uniform) gen y = 1 + x + u

reg y x eret list /*gives the list of all stored results */ predict yhat /*gives the predicted value of y*/ predict res, res /*gives the residuals*/

leanout is a prefix which simplifies the output. This command does not display useless ancillary statistics and focus and confidence intervals rather than null hypothesis testing.

ssc install leanout leanout : reg y x

Performing multiple regression on the same subsample
Sometimes you want to perform multiple regressions on the same subsample. This is not obvious since when one of the variable of the model is missing the observation is dropped. One way to be sure that you use the same subsample is to use the 'e(sample)' command which returns the list of all used observations. In the example below qui store the result of 'e(sample)' in variables 'samp1' and 'samp2' and we perform the model conditioning on 'samp1==1 & samp2 == 1'. Thus we are sure that both estimation are done using the same observations.

. clear . set obs 1000 . gen u = invnorm(uniform) . gen x = invnorm(uniform) . gen y1 = 1 + x + u if uniform < .8 . gen y2 = 1 + x + u if uniform < .9 . qui reg y1 x . gen samp1 = e(sample) . ta samp1 . qui reg y2 x . gen samp2 = e(sample) . ta samp2 . eststo clear . eststo : qui : reg y1 x if samp1 & samp2 . eststo : qui : reg y2 x if samp1 & samp2 . esttab, star(* 0.1 ** 0.05 *** 0.01) se

Instrumental Variables
Here is a data generating process for an instrumental variable setting. u is correlated with x which gives endogeneity. z is independant of u and correlated with x, which makes it eligible as a valid instrument for x.

clear set obs 1000 gen u = invnorm(uniform) gen z = invnorm(uniform) gen x = invnorm(uniform) + z + u gen y = 1 + 2*x + u

It easy to see that the standard least square estimate is biased and the IV estimate is unbiased.

eststo clear eststo : reg y x eststo : ivreg y (x=z) esttab, se

You can perform an overidentification test using overid or ivreg2

clear set obs 1000 gen u = invnorm(uniform) gen z1 = invnorm(uniform) gen z2 = invnorm(uniform) gen x = invnorm(uniform) + z1 - 2*z2 + u gen y = 2*x + u

ivreg y (x=z1 z2) overid ivreg2 y (x=z1 z2)

Seemingly Unrelated Equations
. clear . set obs 1000 . local s11 = 1 . local s12 = .5 . local s22 = 1 . local s13 = .5 . local s23 = .5 . local s33 = 1 . forvalues k = 1/3{ 2. tempvar u`k' 3. gen `u`k'' = invnorm(uniform) 4. } . gen eta1 = `s11' * `u1' . gen eta2 = `s12' * `u1' + `s22' * `u2' . gen eta3 = `s13' * `u1' + `s23' * `u2' + `s33' * `u3' . gen x = invnorm(uniform) . forvalues k=1/3{ 2. gen z`k' = invnorm(uniform) 3. } . gen y1 = 1 + 2*x + z1 + eta1 . gen y2 = - 1 + x + z2 + eta2 . gen y3 = 4 + z3 + eta3 . global eq1 = "y1 x z1" . global eq2 = "y2 x z2" . global eq3 = "y3 x z3" . reg $eq1 . reg $eq2 . reg $eq3 . sureg (toto1 : $eq1) (toto2 : $eq2) (toto3 : $eq3)

Linear Panel Data

 * xtset
 * xtreg
 * xtabond
 * xtabond2
 * ivreg2
 * xtivreg2
 * ivendog
 * ivhettest
 * overid : overidentification test
 * xtoverid : overidentification test
 * xttest2
 * ivgmm0
 * xtarsim
 * xtdpd
 * xtdpdsys

Random effect estimator
We assume $$y_{it} = 1 + x_{it} + z_{i} + f_{i} + u_{it}$$. With f independant of x and z and u independant of x and z. . clear . set obs 1000 . gen id = _n . gen f = invnorm(uniform) . gen z = uniform . expand 10 . gen u = invnorm(uniform) . gen x = uniform . gen y = 1 + x + z + f + u . eststo clear . eststo : qui : reg y x z . eststo : qui : reg y x z, robust . eststo : qui : reg y x z, cluster(id) . eststo : qui : xtreg y x z, i(id) re . eststo : qui : xtreg y x z, i(id) mle . eststo : qui : xtmixed y x z || id :, mle . esttab *, se

Dynamic Linear Panel Data
Layard and Nickel unemployment dataset.

. use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta, clear (Layard & Nickell, Unemployment in Britain, Economica 53, 1986 from Ox dist)

You can also generate fake data : clear set obs 10000 set seed 123456 gen id = _n gen f= invnorm(uniform) forvalues t=1/5{ gen u`t' = invnorm(uniform) }	gen y1 = f/.3 + u1 	forvalues t=2/5{ local z=`t'-1 gen y`t' = .7 * y`z' +  f +  u`t' } save wide, replace reshape long y, i(id) j(year) drop u* f tsset siren an save long, replace

It is easy to see that standard random effect and fixed effect models are biased but instrumented random and fixed effect are unbiased : eststo clear eststo : qui : xtreg y l.y, re eststo : qui : xtreg y l.y, fe eststo : qui : xtivreg y (l.y= l2.d.y), re eststo : qui : xtivreg y (l.y= l2.y) , fd esttab ,se

eststo clear eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level)) nomata  robust eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level)) ivstyle(, e(diff)) nomata  robust eststo : qui : xi : xtabond2 y l.y, iv(l.y l2.y l3.y, equation(diff))  nomata  robust esttab, se