Stata/Descriptive Statistics

In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.

Describe a dataset

 * 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
 * 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
 * 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.

. sysuse cancer, clear (Patient Survival in Drug Trial) . describe . des, s . ret list


 * codebook
 * inspect

Continuous variables

 * su
 * su, d
 * robmean : robust mean

Discrete variables

 * ta

Continuous variables

 * corr returns the matrix of linear correlation between a set of variables.
 * corr, cov returns the covariance matrix.

Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.

. clear . set obs 1000 . gen x = invnorm(uniform) . gen u = invnorm(uniform) . gen y = x + u . tw sc y x || lfit y x . corr y x (obs=1000)

|       y        x -+-- y |  1.0000 x |  0.7197   1.0000


 * wincorr returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.


 * spearman and spearman2 gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.

. spearman y x

Number of obs =   1000 Spearman's rho =      0.7090

Test of Ho: y and x are independent Prob > |t| =      0.0000

Discrete variables

 * ta

Continuous and discrete variables

 * catgraph : plotting means of a continuous variable by categories
 * table