R Programming/Data types

Data types
Vectors are the simplest R objects, an ordered list of primitive R objects of a given type (e.g. real numbers, strings, logicals). Vectors are indexed by integers starting at 1. Factors are similar to vectors but where each element is categorical, i.e. one of a fixed number of possibilities (or levels). A matrix is like a vector but with a specific instruction for the layout such that it looks like a matrix, i.e. the elements are indexed by two integers, each starting at 1. Arrays are similar to matrices but can have more than 2 dimensions. A list is similar to a vector, but the elements need not all be of the same type. The elements of a list can be indexed either by integers or by named strings, i.e. an R list can be used to implement what is known in other languages as an "associative array", "hash table", "map" or "dictionary". A dataframe is like a matrix but does not assume that all columns have the same type. A dataframe is a list of variables/vectors of the same length. Classes define how objects of a certain type look like. Classes are attached to object as an attribute. All R objects have a class, a type and a dimension.

Vectors
You can create a vector using the c function which concatenates some elements. You can create a sequence using the : symbol or the seq function. For instance 1:5 gives all the number between 1 and 5. The seq function lets you specify the interval between the successive numbers. You can also repeat a pattern using the rep function. You can also create a numeric vector of missing values using numeric, a character vector of missing values using character and a logical vector of missing values (ie <tt>FALSE</tt>) using <tt>logical</tt>

The <tt>length</tt> computes the length of a vector. <tt>last</tt> (sfsmisc) returns the last element of a vector but this can also be achieved simply without the need for an extra package.

Factors
<tt>factor</tt> transforms a vector into a factor. A factor can also be ordered with the option <tt>ordered=T</tt> or the function <tt>ordered</tt>. <tt>levels</tt> returns the levels of a factor. <tt>gl</tt> generates factors. <tt>n</tt> is the number of levels, <tt>k</tt> the number of repetition of each factor and <tt>length</tt> the total length of the factor. <tt>labels</tt> is optional and gives labels to each level.

Factors can be most easily thought of as categorical variables. An important function for factor analysis is the <tt>table</tt> function, which offers a type of summary. When considering the types of statistical data (nominal, ordinal, interval and ratio), factors can be nominal, ordinal or interval. Nominal factors are categorical names, examples of which could be country names paired with some other information. An example of an ordinal factor would be a set of race times for a particular athlete paired with the athlete's finishing place (first, second, ...). When trying to summarize this factor, please see the example with ordinal examples below for an example on self-ordering your factors. Finally, an example of interval level factors would be age brackets such as "20 - 29", "30 - 39", etc. In general, R can automatically order numbers stored as factors appropriately but a programmer may use the same techniques with this type of data to order in the manner most appropriate to their application.

See also <tt>is.factor</tt>, <tt>as.factor</tt>, <tt>is.ordered</tt> and <tt>as.ordered</tt>.

Matrix

 * If you want to create a new matrix, one way is to use the <tt>matrix</tt> function. You have to enter a vector of data, the number of rows and/or columns and finally you can specify if you want R to read your vector by row or by column (the default option). Here are two examples.


 * Functions <tt>cbind</tt> and <tt>rbind</tt> combine vectors into matrices in a column by column or row by row mode:


 * The dimension of a matrix can be obtained using the <tt>dim</tt> function. Alternatively <tt>nrow</tt> and <tt>ncol</tt> returns the number of rows and columns in a matrix:


 * Function <tt>t</tt> transposes a matrix:


 * Unlike data frames matrices must either be numeric or character in type:

Arrays
An array is composed of n dimensions where each dimension is a vector of R objects of the same type. An array of one dimension of one element may be constructed as follows.

The array x was created with a single dimension (dim=c(1)) drawn from the vector of possible values c(T,F). A similar array, y, can be created with a single dimension and two values.

A three dimensional array - 3 by 3 by 3 - may be created as follows.

R arrays are accessed in a manner similar to arrays in other languages: by integer index, starting at 1 (not 0). The following code shows how the third dimension of the 3 by 3 by 3 array can be accessed. The third dimension is a 3 by 3 array.

Specifying two of the three dimensions returns an array on one dimension.

Specifying three of three dimension returns an element of the 3 by 3 by 3 array.

More complex partitioning of array may be had.

Arrays need not be symmetric across all dimensions. The following code creates a pair of 3 by 3 arrays.

Objects of the vectors composing the array must be of the same type, but they need not be numbers.

Lists
A list is a collection of R objects. <tt>list</tt> creates a list. <tt>unlist</tt> transform a list into a vector. The objects in a list do not have to be of the same type or length.

lists have very flexible methods for reference
 * by index number:
 * By name:
 * This can also be recursive and in combination

Using the scoping rules in R one can also dynamically name and create list elements

Data Frames
A dataframe has been referred to as "a list of variables/vectors of the same length". In the following example, a dataframe of two vectors is created, each of five elements. The first vector, v1, is composed of a sequence of the integers 1 through 5. A second vector, v2, is composed of five logical values drawn of type T and F. The dataframe is then created, composed of the vectors. The columns of the data frame can be accessed using integer subscripts or the column name and the <tt>$</tt> symbol.

The dataframe may be created directly. In the following code, the dataframe is created - naming each vector composing the dataframe as part of the argument list.