Stata/Data Management

Read and import data
Usually, data are loaded into memory using the  command. The  option makes it sure that the current database in memory will be removed without saving the last changes.

use "W:\Data\…\table.dta", clear

The  command allows to specify a working directory and makes it easier to load tables into memory. cd "W:\Data\" use table, clear

Stata9 users can import Stata10 datasets using the  command.

use10 table, clear

Some example datasets are stored in the Stata directory. They can be loaded into memory using the  command.

. sysuse cancer, clear . sysuse smoking, clear . sysuse auto, clear . sysuse jspmix, clear

You can import a Comma Separated Value (CSV) format using insheet

insheet using "W:\Data\…\table.csv", delim(";")

Save and export data
save table, replace
 * save

If you use Stata10 you can export to Stata9 format using saveold saveold table, replace

outsheet using "W:\Data\…\table.csv", replace comma
 * outsheet : export to tab delimited or csv format.

See also
 * outfile
 * xmlsave
 * fdasave

Append and merge
The standard Stata command is. However, the user-written command  is safer and gives a better output. This command may be installed using  command or using.


 * dmerge
 * joinby merge all possible pairs between the datasets


 * append if you have two datasets with the same variable but different observations, you can make one dataset using the append command.

use data_1, clear append data_2 br

Describe a datasets

 * des
 * des, s
 * codebook
 * codebook2

Detect missing values

 * tabmiss
 * npresent
 * nmissing

You can convert missing values to values using the mvencode command. mvencode exg ga dvg verts eco dr dvd fn reg mnr div, mv(0) override

Variables
Very often you have to convert variable from a string to a numerical format. There are several way to do it. If you already have numeric values in your string variable, you should use destring. Otherwise you should use the encode command. Encode will automatically create a numerical variable and will use as a value label the string values of the previous variable.


 * gen
 * egen
 * replace
 * recode
 * drop
 * keep
 * rename

'vallist' gives the list of all categories of a categorical variable in Stata. vallist codep

Dealing with labels

 * lab var
 * lab list
 * lab define
 * lab value

Expand
This is useful for generating panel data models. In the first example, we draw 10 observations in a standard normal distribution and we replicate each observation once.
 * You can expand a dataset (ie multiplying observations by a given factor) using the expand command.

clear set obs 10 gen u = invnorm(uniform) expand 2 sort u br

It is also possible to pass an integer variable as an argument to expand. clear set obs 10 gen u = uniform gen var = 1 + int(10 * uniform) expand var sort u br

clear set obs 10 gen u = invnorm(uniform) expandcl 2, gen(cl)

Data Storage types
All numeric types in Stata are normal "signed" quantities except that the highest 27 values are reserved for the "missing" types (., .a, .b, ..., .z). The storage size of the each variable is as follows: