R Programming/Nonparametric Methods

This page deals with a set of including the estimation of a cumulative distribution function (CDF), the estimation of probability density function (PDF) with histograms and kernel methods and the estimation of flexible regression models such as local regressions and generalized additive models.

For an introduction to nonparametric methods you can have a look at the following books or handout :
 * Nonparametric Econometrics: A Primer by Jeffrey S. Racine.
 * Li and Racine's handbook, Nonparametric econometrics.
 * Larry Wasserman All of Nonparamatric Statistics

Empirical distribution function

 * The easiest way to estimate the empirical CDF uses the rank and the length functions.
 * ecdf computes the empirical cumulative distribution function.
 * ecdf.ksCI (sfsmisc) plots the empirical distribution function with confidence intervals.

Histogram

 * hist is the standard function for drawing histograms. If you store the histogram as an object the estimated parameters are returned in this object.

It is also possible to choose the break points.


 * n.bins (car package) includes several methods to compute the number of bins for an histogram.
 * histogram (lattice)
 * truehist (MASS)
 * <tt>hist.scott</tt> (MASS) plot a histogram with automatic bin width selection, using the Scott or Freedman–Diaconis formulae.
 * histogram package.

Kernel Density Estimation

 * <tt>density</tt> estimates the of a vector.
 * Choose the bandwidth selection method with <tt>bw</tt>.
 * Check the sensitivity of the bandwidth choice using <tt>adjust</tt>. The default is one. It is good practice to look at <tt>adjust=.5</tt> and <tt>adjust=2</tt>.
 * Choose the kernel function with <tt>kernel</tt> : "gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine".


 * <tt>tkdensity</tt> (sfsmisc) is a nice function which allow to dynamically choose the kernel and the bandwidth with a handy graphical user interface. This is a good way to check the sensitivity of the bandwidth and/or kernel choice on the density estimation.
 * <tt>kde2d</tt> (MASS) estimates a bivariate kernel density.

Local Regression

 * <tt>loess</tt> is the standard function for local linear regression.
 * <tt>lowess</tt> is similar to <tt>loess</tt> but does not have a standard syntax for regression .This is the ancestor of loess (with different defaults!).
 * <tt>ksmooth</tt> (stats) computes the Nadaraya–Watson kernel regression estimate.
 * <tt>locpoly</tt> (KernSmooth package)
 * <tt>npreg</tt> (np package)
 * locpol computes local polynomial estimators
 * locfit local regression, likelihood and density estimation

Generalized additive semiparametric models (GAM)

 * <tt>gam</tt> (gam)
 * <tt>gam</tt> (mgcv)