R Programming/Binomial Models

In this section, we look at the binomial model. We have one outcome which is binary and a set of explanatory variables.

This kind of model can be analyzed using a linear probability model. However a drawback of this model for the parameter of the Bernoulli distribution is that, unless restrictions are placed on $$ \beta $$, the estimated coefficients can imply probabilities outside the unit interval $$ [0,1] $$. For this reason, models such as the logit model or the probit model are more commonly used. If you want to estimate a linear probability model, have a look at the linear models page.

Logit model
The model takes the form : $$y_i \sim Bernoulli(\pi_i)$$ with the inverse link function : $$\pi_i = \frac{exp(x_i'\beta)} {(1 + exp(x_i'\beta))}$$. It can be estimated using maximum likelihood or using bayesian methods.

Maximum likelihood estimation

 * The standard way to estimate a logit model is glm function with family binomial and link logit.
 * lrm (Design) is another implementation of the logistic regression model.
 * There is an implementation in the Zelig package.

In this example, we simulate a model with one continuous predictor and estimate this model using the glm function.

Zelig
The Zelig' package makes it easy to compute all the quantities of interest.

We develop a new example. First we simulate a new dataset with two continuous explanatory variables and we estimate the model using zelig with the model = "logit" option.
 * We the look at the predicted values of y at the mean of x1 and x2
 * Then we look at the predicted values when x1 = 0 and x2 = 0
 * We also look at what happens when x1 changes from the 3rd to the 1st quartile.


 * ROC Curve in the verification package.
 * Zelig has a rocplot function.


 * See UCLA Statistical Computing example

Bayesian estimation

 * <tt>bayesglm</tt> in the arm package
 * <tt>MCMClogit</tt> in the MCMCpack for a bayesian estimation of the logit model.

Probit model
The probit model is a binary model in which we assume that the link function is the cumulative density function of a normal distribution.

We simulate fake data. First, we draw two random variables x1 and x2 in any distributions (this does not matter). Then we create the vector xbeta as a linear combination of x1 and x2. We apply the link function to that vector and we draw the binary variable y as Bernouilli random variable.

Maximum likelihood
We can use the <tt>glm</tt> function with <tt>family=binomial(link=probit)</tt> option or the <tt>probit</tt> function in the sampleSelection package which is a wrapper of the former one.

Bayesian estimation

 * <tt>MCMCprobit</tt> (MCMCpack)

Semi-Parametric models

 * Klein and Spady estimator is implemented in the np package (see npindex with <tt>method = "kleinspady"</tt> option).