Statistics/Introduction/Need To Know

Statistics is a diverse subject and thus the mathematics that are required depend on the kind of statistics we are studying. A strong background in linear algebra is needed for most multivariate statistics, but is not necessary for introductory statistics. A background in Calculus is useful no matter what branch of statistics is being studied.

At a bare minimum the student should have a grasp of basic concepts taught in Algebra and be comfortable with "moving things around" and solving for an unknown. Most of the statistics here will derive from a few basic things that the reader should become acquainted with.

Absolute Value
$$|x| \equiv \begin{cases} x, & x \ge 0 \\ -x, & x < 0 \end{cases}$$

If the number is zero or positive, then the absolute value of the number is simply the same number. If the number is negative, then take away the negative sign to get the absolute value.

Examples

 * 42| = 42
 * -5| = 5
 * 2.21| = 2.21

Factorials
A factorial is a calculation that gets used a lot in probability. It is defined only for integers greater-than-or-equal-to zero as:

$$ n! \equiv \begin{cases} n \cdot (n-1)!, & n \ge 1 \\ 1, & n = 0 \end{cases} $$

Examples
In short, this means that:

Summation
The summation (also known as a series) is used more than almost any other technique in statistics. It is a method of representing addition over lots of values without putting + after +. We represent summation using a big uppercase sigma: &sum;.

Examples
Very often in statistics we will sum a list of related variables:

$$	\sum_{i=0}^n x_i = x_0 + x_1 + x_2 + \cdots + x_n $$

Here we are adding all the x variables (which will hopefully all have values by the time we calculate this). The expression below the &sum; (i=0, in this case) represents the index variable and what its starting value is (i with a starting value of 0) while the number above the &sum; represents the number that the variable will increment to (stepping by 1, so i = 0, 1, 2, 3, and then 4). Another example:

$$ \sum_{i=1}^4 2i = 2(1) + 2(2) + 2(3) + 2(4) = 2 + 4 + 6 + 8 = 20 $$

Notice that we would get the same value by moving the 2 outside of the summation (perform the summation and then multiply by 2, rather than multiplying each component of the summation by 2).

Infinite series
There is no reason, of course, that a series has to count on any determined, or even finite value—it can keep going without end. These series are called "infinite series" and sometimes they can even converge to a finite value, eventually becoming equal to that value as the number of items in your series approaches infinity (&infin;).

Examples
This example is the famous geometric series. Note both that the series goes to &infin; (infinity, that means it does not stop) and that it is only valid for certain values of the variable r. This means that if r is between the values of -1 and 1 (-1 < r < 1) then the summation will get closer to (i.e., converge on) 1 / 1-r the further you take the series out.

Linear Approximation
Let us say that you are looking at a table of values, such as the one above. You want to approximate (get a good estimate of) the values at 63, but you do not have those values on your table. A good solution here is use a linear approximation to get a value which is probably close to the one that you really want, without having to go through all of the trouble of calculating the extra step in the table.

$$ f\left(x_i\right) \approx \frac{f\left(x_{\lceil i \rceil}\right) - f\left(x_{\lfloor i \rfloor}\right)}{x_{\lceil i \rceil} - x_{\lfloor i \rfloor}} \cdot \left(x_i - x_{\lfloor i \rfloor}\right) + f\left(x_{\lfloor i \rfloor}\right) $$

This is just the equation for a line applied to the table of data. xi represents the data point you want to know about, $$x_{\lfloor i \rfloor}$$ is the known data point beneath the one you want to know about, and $$x_{\lceil i \rceil}$$ is the known data point above the one you want to know about.

Examples
Find the value at 63 for the 0.05 column, using the values on the table above.

First we confirm on the above table that we need to approximate the value. If we know it exactly, then there really is no need to approximate it. As it stands this is going to rest on the table somewhere between 60 and 70. Everything else we can get from the table:

$$ f(63) \approx \frac{f(70) - f(60)}{70 - 60} \cdot (63 - 60) + f(60) = \frac{1.66691  - 1.67065}{10} \cdot 3 + 1.67065 = 1.669528 $$

Using software, we calculate the actual value of f(63) to be 1.669402, a difference of around 0.00013. Close enough for our purposes.