Statistics/Testing Data/t-tests

'''Note: Some of the statements in the text below are disputed. For small sample size non-parametric tests like the Mann-Whitney U test or the Wilcoxon rank-sum test might rather be used than a t-test.'''

The t- test is the most powerful parametric test for calculating the significance of a small sample mean.

A one sample t-test has the following null hypothesis:

$$H_0: \quad \mu=c$$

where the Greek letter $$\mu$$ (mu) represents the population mean and c represents its assumed (hypothesized) value. In statistics it is usual to employ Greek letters for population parameters and Roman letters for sample statistics. The t-test is the small sample analog of the z test which is suitable for large samples. A small sample is generally regarded as one of size n<30.

A t-test is necessary for small samples because their distributions are not normal. If the sample is large (n>=30) then statistical theory says that the sample mean is normally distributed and a z test for a single mean can be used. This is a result of a famous statistical theorem, the Central limit theorem.

A t-test, however, can still be applied to larger samples and as the sample size n grows larger and larger, the results of a t-test and z-test become closer and closer. In the limit, with infinite degrees of freedom, the results of t and z tests become identical.

In order to perform a t-test, one first has to calculate the "degrees of freedom." This quantity takes into account the sample size and the number of parameters that are being estimated. Here, the population parameter, mu is being estimated by the sample statistic x-bar, the mean of the sample data. For a t-test the degrees of freedom of the single mean is n-1. This is because only one population parameter (the population mean)is being estimated by a sample statistic (the sample mean).

degrees of freedom (df)=n-1

''For example, for a sample size n=15, the df=14. ''

Example
''A college professor wants to compare her students' scores with the national average. She chooses a simple random sample (SRS) of 20 students, who score an average of 50.2 on a standardized test. Their scores have a standard deviation of 2.5. The national average on the test is a 60. She wants to know if her students scored significantly lower than the national average.''

Significance tests follow a procedure in several steps.

Step 1
First, state the problem in terms of a distribution and identify the parameters of interest. Mention the sample. We will assume that the scores (X) of the students in the professor's class are approximately normally distributed with unknown parameters &mu; and &sigma;

Step 2
State the hypotheses in symbols and words.

$$H_O: \quad \mu=60$$

The null hypothesis is that her students scored on par with the national average.

$$H_A: \quad \mu<60$$

The alternative hypothesis is that her students scored lower than the national average.

Step 3
Secondly, identify the test to be used. Since we have an SRS of small size and do not know the standard deviation of the population, we will use a one-sample t-test.

The formula for the t-statistic T for a one-sample test is as follows:


 * $$T = \frac{\overline{X} - 60}{S/\sqrt{20}}$$

where $$\overline{X}$$ is the sample mean and S is the sample standard deviation.

A quite common mistake is to say that the formula for the t-test statistic is:


 * $$T = \frac{\overline{x} - \mu}{s/\sqrt{n}}$$

This is not a statistic, because &mu; is unknown, which is the crucial point in such a problem. Most people even don't notice it. Another problem with this formula is the use of x and s. They are to be considered the sample statistics and not their values.

The right general formula is:


 * $$T = \frac{\overline{X} - c}{S/\sqrt{n}}$$

in which c is the hypothetical value for &mu; specified by the null hypothesis.

(The standard deviation of the sample divided by the square root of the sample size is known as the "standard error" of the sample.)

Step 4
State the distribution of the test statistic under the null hypothesis. Under H0 the statistic T will follow a Student's distribution with 19 degrees of freedom: $$T \sim \tau\cdot(20-1) $$.

Step 5
Compute the observed value t of the test statistic T, by entering the values, as follows:


 * $$t = \frac{\overline{x} - 60}{s/\sqrt{20}} = \frac{50.2 - 60.0}{2.5/\sqrt{20}} = \frac{-9.8}{2.5/4.47} = \frac{-9.8}{0.559} = -17.5$$

Step 6
Determine the so-called p-value of the value t of the test statistic T. We will reject the null hypothesis for too small values of T, so we compute the left p-value:
 * p-value $$ = P(T \leq t ;H_0) = P(T(19) \leq -17.5) \approx 0 $$

The Student's distribution gives $$T(19)=1.729$$ at probabilities 0.95 and degrees of freedom 19. The p-value is approximated at 1.777e-13.

Step 7
Lastly, interpret the results in the context of the problem. The p-value indicates that the results almost certainly did not happen by chance and we have sufficient evidence to reject the null hypothesis. The professor's students did score significantly lower than the national average.