Statistics/Hypothesis Testing

Introduction
In previous chapters, we have discussed two methods for, namely and. Estimating unknown parameters is an important area in statistical inference, and in this chapter we will discuss another important area, namely, which is related to. Indeed, the concepts of and  are closely related, as we will demonstrate.

Basic concepts and terminologies
Before discussing how to hypothesis testing, and  the "goodness" of a hypothesis test, let us introduce some basic concepts and terminologies related to hypothesis testing first.

There are two terms that classify hypotheses:

Sometimes, it is not immediately clear that whether a hypothesis is simple or composite. To understand the classification of hypotheses more clearly, let us consider the following example.

In hypothesis tests, we consider two hypotheses:

A general form of $$H_0$$ and $$H_1$$ is $$H_0:\theta\in\Theta_0$$ and $$H_1:\theta\in\Theta_1$$ where $$\Theta_1=\Theta_0^c$$, which is the of $$\Theta_0$$ (with respect to $$\Theta$$), i.e., $$\Theta_0^c=\Theta\setminus\Theta_0$$ ($$\Theta$$ is the parameter space, containing all possible values of $$\theta$$). The reason for choosing the complement of $$\Theta_0$$ in $$H_1$$ is that $$H_1$$ is the complementary hypothesis to $$H_0$$, as suggested in the above definition.

We have mentioned that exactly one of $$H_0$$ and $$H_1$$ is assumed to be true. To make a decision, we need to which hypothesis should be regarded as true. Of course, as one may expect, this decision is not perfect, and we will have some errors involved in our decision. So, we cannot say we "prove that" a particular hypothesis is true (that is, we cannot be that a particular hypothesis is true). Despite this, we may "regard" (or "accept") a particular hypothesis as true (but prove it as true) when we have  that lead us to make this decision (ideally, with small errors ).

Now, we are facing with two questions. First, what evidences should we consider? Second, what is meant by "sufficient"? For the first question, a natural answer is that we should consider the observed, right? This is because we are making hypothesis about the population, and the samples are taken from, and thus closely related to the population, which should help us make the decision.

To answer the second question, we need the concepts in. In particular, in hypothesis testing, we will construct a so-called or  to help us determining that  we should reject the  hypothesis (i.e., regard $$H_0$$ as false),  and hence (naturally) regard $$H_1$$ as true ("accept" $$H_1$$) (we have assumed that exactly one of $$H_0$$ and $$H_1$$ is true, so when we regard one of them as false, we should regard another of them as true). In particular, when we do reject $$H_0$$, we will act as if, or accept $$H_0$$ as true (and thus should also reject $$H_1$$ since exactly one of $$H_0$$ of $$H_1$$ is true).

Let us formally define the terms related to hypothesis testing in the following.

Typically, we use (a statistic for conducting a hypothesis test) to specify the rejection region. For instance, if the random sample is $$X_1,\dotsc,X_n$$ and the test statistic is $$\overline X$$, the rejection region may be, say, $$R=\{\mathbf x:\overline x<2\}$$ (where $$x_1,\dotsc,x_n$$ and $$\overline x$$ is observed value of $$X_1,\dotsc,X_n$$ and $$\overline X$$ respectively). Through this, we can directly construct a hypothesis test: when $$\mathbf x\in R$$, we reject $$H_0$$ and accept $$H_1$$. Otherwise, if $$\mathbf x\in R^c$$, we accept $$H_0$$. So, in general, to specify the rule in a hypothesis test, we just need a. After that, we will apply the test on testing $$H_0$$ against $$H_1$$. There are some terminologies related to the hypothesis tests constructed in this way:

As we have mentioned, the decisions made by hypothesis test should not be perfect, and errors occur. Indeed, when we think carefully, there are actually of errors, as follows:

We can illustrate these two types of errors more clearly using the following table. We can express $$H_0:\theta\in\Theta_0$$ and $$H_1:\theta\in\Theta_0^c$$. Also, assume the rejection region is $$R=R(\mathbf X)$$ (i.e., the rejection region with "$$x$$" replaced by "$$X$$"). In general, when "$$R$$" is put together with "$$X$$", we assume $$R=R(\mathbf X)$$.

Then we have some notations and expressions for of making type I and II errors: (let $$X_1,\dotsc,X_n$$ be a random sample and $$\mathbf X=(X_1,\dotsc,X_n)$$)
 * The probability of making a type I error, denoted by $$\alpha(\theta)$$, is $$\mathbb P_{\theta}(\mathbf X\in R)$$ if $$\theta\in\Theta_0$$.
 * The probability of making a type II error, denoted by $$\beta(\theta)$$, is $$\mathbb P_{\theta}(\mathbf X\in R^c)=1-\mathbb P_{\theta}(\mathbf X\in R)$$ if $$\theta\in\Theta_0^c$$.

Notice that we have a common expression in both $$\alpha(\theta)$$ and $$\beta(\theta)$$, which is "$$\mathbb P_{\theta}((X_1,\dotsc,X_n)\in R)$$". Indeed, we can also write this expression as $$\mathbb P_{\theta}((X_1,\dotsc,X_n)\in R)=\begin{cases}\alpha(\theta),&\theta\in\Theta_0;\\1-\beta(\theta),&\theta\in\Theta_0^c.\end{cases}$$ Through this, we can observe that this expression contains all informations about the probabilities of making errors, given a hypothesis test with rejection $$R$$. Hence, we will give a special name to it:

Ideally, we want to make both $$\alpha(\theta)$$ and $$\beta(\theta)$$ arbitrarily small. But this is generally impossible. To understand this, we can consider the following extreme examples: We can observe that to make $$\alpha(\theta)$$ ($$\beta(\theta)$$) to be very small, it is inevitable that $$\beta(\theta)$$ ($$\alpha(\theta)$$) will increase consequently, due to accepting (rejecting) "too much". As a result, we can only try to minimize the probability of making one type of error, holding the probability of making another type of error.
 * Set the rejection region $$R$$ to be $$S=\{\mathbf x\}$$, which is the set of all possible observations of random samples. Then, $$\pi(\theta)=1$$ for each $$\theta\in\Theta$$. From this, of course we have $$\beta(\theta)=0$$, which is nice. But the serious problem is that $$\alpha(\theta)=1$$ due to the mindless rejection.
 * Another extreme is setting the rejection region $$R$$ to be the empty set $$\varnothing$$. Then, $$\pi(\theta)=0$$ for each $$\theta\in\Theta$$. From this, we have $$\alpha(\theta)=0$$, which is nice. But, again the serious problem is that $$\beta(\theta)=1$$ due to the mindless acceptance.

Now, we are interested in knowing that type of errors should be controlled. To motivate the choice, we can again consider the analogy of legal principle of presumption of innocence. In this case, type I error means proving an innocent guilty, and type II error means acquitting a guilty person. Then, as suggested by Blackstone's ratio, type I error is more serious and important than type II error. This motivates us to control the probability of type I error, i.e., $$\alpha(\theta)$$, at a specified small value $$\alpha^*$$, so that we can control the probability of making this more serious error. After that, we consider the tests that "control the type I error probability at this level", and the one with the smallest $$\beta(\theta)$$ is the "best" one (in the sense of probability of making errors).

To describe "control the type I error probability at this level" in a more precise way, let us define the following term.

So, using this definition, controlling the type I error probability at a particular level $$\alpha$$ means that the size of the test should not exceed $$\alpha$$, i.e., $$\sup_{\theta\in\Theta_0}\pi(\theta)\le\alpha$$ (in some other places, such test is called a.

For now, we have focused on using to conduct hypothesis tests. But this is not the only way. Alternatively, we can make use of $$p$$-value.

The following theorem allows us to use $$p$$-value for hypothesis testing.

Evaluating a hypothesis test
After discussing some basic concepts and terminologies, let us now study some ways to evaluate goodness of a hypothesis test. As we have previously mentioned, we want the probability of making type I errors and type II errors to be small, but we have mentioned that it is generally impossible to make both probabilities to be arbitrarily small. Hence, we have suggested to control the type I error, using the size of a test, and the "best" test should the one with the smallest probability of making type II error, after controlling the type I error.

These ideas lead us to the following definitions.

Using this definition, instead of saying "best" test (test with the smallest type II error probability), we can say "a test with the most power", or in other words, the "most powerful test".

Constructing a hypothesis test
There are many ways of constructing a hypothesis test, but of course not all are good (i.e., "powerful"). In the following, we will provide some common approaches to construct hypothesis tests. In particular, the following lemma is very useful for constructing a MP test with size $$\alpha$$.

Neyman-Pearson lemma
Even if the hypotheses involved in the Neyman-Pearson lemma are simple, with some conditions, we can use the lemma to construct a UMP test for testing null hypothesis against  alternative hypothesis. The details are as follows: For testing $$H_0:\theta\le\theta_0\quad\text{vs.}\quad H_1:\theta>\theta_0$$ For testing $$H_0:\theta\ge\theta_0\quad\text{vs.}\quad H_1:\theta<\theta_0$$, the steps are similar. But in general, there is no UMP test for testing $$H_0:\theta=\theta_0\quad\text{vs.}\quad H_1:\theta\ne\theta_0$$.
 * 1) Find a MP test $$\varphi$$ with size $$\alpha$$, for testing $$H_0:\theta=\theta_0\quad\text{vs.}\quad H_1:\theta=\theta_1>\theta_0$$ using the Neyman-Pearson lemma, where $$\theta_1$$ is an arbitrary value such that $$\theta_1>\theta_0$$.
 * , then the test $$\varphi$$ has the greatest power for $$\theta\in\Theta_1=\{\vartheta:\vartheta>\theta_0\}$$. So, the test $$\varphi$$ is a UMP test with size $$\alpha$$ for testing $$H_0:\theta=\theta_0\quad\text{vs.}\quad H_1:\theta>\theta_0$$
 * , then it means that the size of the test $$\varphi$$ is still $$\alpha$$, even if the null hypothesis is changed to $$H_0:\theta\le\theta$$. So, after changing $$H_0:\theta=\theta_0$$ to $$H_0:\theta\le \theta_0$$ and not changing $$H_1$$ (also adjusting the parameter space) for the test $$\varphi$$, the test $$\varphi$$ still satisfies the "MP" requirement (because of not changing $$H_1$$, so the result in step 2 still applies), and also the test $$\varphi$$ will satisfy the "size" requirement (because of changing $$H_0$$ in this way). Hence, the test $$\varphi$$ is a UMP test with size $$\alpha$$ for testing $$H_0:\theta\le\theta_0\quad\text{vs.}\quad H_1:\theta>\theta_0$$.

Of course, when the condition in step 3 holds but that in step 2 does not hold, the test $$\varphi$$ in step 1 is a UMP test with size $$\alpha$$ for testing $$H_0:\theta\le\theta_0\quad\text{vs.}\quad H_1:\theta=\theta_1$$ where $$\theta_1$$ is a constant (which is larger than $$\theta_0$$, or else $$H_1$$ and $$H_0$$ are not disjoint). However, the hypotheses are generally not in this form.

Now, let us consider another example where the underlying distribution is discrete.

Likelihood-ratio test
Previously, we have suggested using the Neyman-Pearson lemma to construct MPT for testing simple null hypothesis against simple alternative hypothesis. However, when the hypotheses are composite, we may not be able to use the Neyman-Pearson lemma. So, in the following, we will give a general method for constructing tests for any hypotheses, not limited to simple hypotheses. But we should notice that the tests constructed are not necessarily UMPT.

Relationship between hypothesis testing and confidence intervals
We have mentioned that there are similarities between hypothesis testing and confidence intervals. In this section, we will introduce a theorem suggesting how to construct a hypothesis test from a confidence interval (or in general, confidence ), and vice versa.