Statistics/Interval Estimation

Introduction
Previously, we have discussed, which gives us an estimator $$\hat\theta$$ for the value of an unknown parameter $$\theta$$. Now, suppose we want to know the size of of the point estimator $$\hat\theta$$, i.e. the difference between $$\hat\theta$$ and the unknown parameter $$\theta$$. Of course, we can make use of the value of the of $$\hat\theta$$, $$\mathbb E[(\hat\theta-\theta)^2]$$, or other things.

However, what if we only know about one specific ? We cannot calculate the mean squared error of its corresponding point estimator with this point estimates, right? So, how do we know the possible size of error of this ? Indeed, it is impossible to tell, since we are only given a particular estimated value of parameter $$\theta$$, but of course we do know the value of the unknown parameter $$\theta$$, thus the difference between this point estimate and $$\theta$$ is also unknown.

To illustrate this, consider the following example: suppose we take a random sample of 10 students from one particular course in university to estimate the mean score of the students in the final exam in that course, denoted by $$\mu$$, (assume the score is normally distributed), and the observed value of the sample mean is $$\overline x=60$$. Then, what is the difference between this point estimate and the true unknown parameter $$\mu$$? Can we be "confident" that this sample mean is close to $$\mu$$, say $$\mu\in[\overline x-5,\overline x+5]=[55,65]$$?

It is possible that $$\mu$$ is, say 90, and somehow the students in the sample are the one with very poor performance. On the other hand, it is also possible that $$\mu$$ is, say 30, and somehow the students in the sample are the one who perform well (relatively). Of course, it is also possible that the $$\mu$$ is quite close to 60, say 59. From this example, we can see that a particular value $$\overline x=60$$ does not tell us the possible size of error: the error can be very large, and also can be very small.

In this chapter, we will introduce where we use  that can describe the size of error through providing the probability for the random interval (i.e. interval with at least one of its bounds to be a random variable) given by the interval estimator to contain the unknown parameter $$\theta$$, which measures the "accuracy" of the interval estimator of $$\theta$$, and hence the size of error.

As suggested by the name, the estimator involves some sort of. Also, as one may expect, is also based on :

Of course, we would like the probability for the unknown parameter $$\theta$$ to lie in the interval to be close to 1, so that the interval estimator is very accurate. However, a very accurate interval estimator may have a very bad "precision", i.e. the interval covers "too many" plausible values of an unknown parameter, and therefore even if we know that $$\theta$$ is very likely to be one of such values, there are too many different possibilities. Hence, such interval estimator is not very "useful". To illustrate this, suppose the interval concerned is $$\mathbb R$$, which is the parameter space of $$\theta$$. Then, of course $$\mathbb P(\theta\in \mathbb R)=1$$ (so the "confidence" is high) since $$\theta$$ must lie in its parameter space. However, such interval has basically "zero precision", and is quite "useless", since the "plausible values" of $$\theta$$ in the intervals are essentially all possible values of $$\theta$$.

From this, we can observe the need of the "precision" of the interval, that is, we also want the of the interval to be small, so that we can have some ideas about the "location" of $$\theta$$. However, as the interval becomes smaller, it is more likely that such interval misses $$\theta$$, i.e. does not cover the actual value of $$\theta$$, and therefore the probability for $$\theta$$ to lie in that interval becomes smaller, i.e. the interval becomes less "accurate". To illustrate this, let us consider the extreme case: the interval is so small that it becomes an interval containing a single point (the two end-points of the interval coincide). Then, the "interval estimator" basically becomes a "point estimator" in some sense, and we know that it is very unlikely that the true value of $$\theta$$ equals the value of the point estimator $$\hat\theta$$ ($$\theta$$ lies in that "interval" is equivalent to $$\theta=\hat\theta$$ in this case). Indeed, if the distribution of $$\hat\theta$$ is, then $$\mathbb P(\hat\theta=\theta)=0$$.

As we can see from above, although we want the interval to have a very high "confidence" and also "very precise" (i.e. the interval is very narrow), we cannot have both of them, since an increase in confidence causes a decrease in "precision", and an increase in "precision" causes a decrease in confidence. Therefore, we need to make some compromises between them, and pick an interval that gives a sufficiently high confidence, and also is quite precise. In other words, we would like to have a interval that will cover $$\theta$$ with a.

Terminologies
Now, let us formally define some terminologies related to.

Construction of confidence intervals
After understanding what confidence interval is, we would like to know how to construct one naturally. A main way for such construction is using the, which is defined below.

After having such pivotal quantity $$Q(\mathbf X,\theta)$$, we can construct a $$1-\alpha$$ confidence interval for $$\theta$$ by the following steps:
 * 1) For that value of $$\alpha$$, find $$a,b$$ such that $$\mathbb P(a\le Q(\mathbf X,\theta)\le b)=1-\alpha$$ ($$a,b$$ does not involve $$\theta$$ since $$Q(\mathbf X,\theta)$$ is a pivotal quantity).
 * 2) After that, we can transform $$a\le Q(\mathbf X,\theta)\le b$$ to $$L(\mathbf X)\le \theta\le U(\mathbf X)$$ since the expression of $$Q(\mathbf X,\theta)$$ involves $$\theta$$, as we have assumed (the resulting inequalities should be  to the original inequalities, that is, $$a\le Q(\mathbf X,\theta)\le b{\color{darkgreen}\iff} L(\mathbf X)\le \theta\le U(\mathbf X)$$, so that $$\mathbb P(L(\mathbf X)\le\theta\le U(\mathbf X)){\color{darkgreen}=}\mathbb P(a\le Q(\mathbf X,\theta)\le b)$$).

Confidence intervals for means of normal distributions
In the following, we will use the concept of pivotal quantity to construct confidence intervals for means and variances of distributions. After that, because of the central limit theorem, we can construct confidence intervals for means and variances of other types of distributions that are not normal.

Mean of a normal distribution
Before discussing this confidence interval, let us first introduce a notation: We can find (or calculate) the values of $$z_{\alpha}$$ for different $$\alpha$$ from.
 * $$z_{\alpha}$$ is the upper percentile of $$\mathcal N(0,1)$$ at level $$\alpha$$, i.e. it satisfies $$\mathbb P(Z\ge z_{\alpha})=\alpha$$ where $$Z\sim\mathcal N(0,1)$$.

The following graph illustrates $$\mathbb P(-z_{\alpha/2}\le Z\le z_{\alpha/2})=1-\alpha$$: |                 *-|-*                 /##|##\                   /###|###\  <- area 1-a /####|####\             /#####|#####\             /######|######\            /|######|######|\ area    --*.|######|######|.*-- a/2 --> ....|######|######|.... <--- area a/2 *---          -z_{a/2}       z_{a/2}

We have previously discussed a way to construct confidence interval for the mean when the variance is. However, this is not always the case in practice. We may not know the variance, right? Then, we cannot use the $$\sigma$$ in the confidence interval from the previous theorem.

Intuitively, one may think that we can use the $$S^2$$ to "replace" the $$\sigma^2$$, according to the weak law of large number. Then, we can simply replace the unknown $$\sigma$$ in the confidence interval by the known $$S$$ (or its realization $$s$$ for interval estimate). However, the flaw in this argument is that the sample size may not be large enough to apply the weak law of large number for approximation.

So, you may now ask that when the sample size is large enough, can we do such "replacement" for approximation. The answer is, and we will discuss in the last section about approximated confidence intervals.

Before that section, the confidence intervals discussed is in the sense that no approximation is used to construct them. Therefore, the confidence intervals constructed "work" for sample size, no matter how large or how small it is (it works even if the sample size is 1, although such confidence interval constructed may not be very "nice", in the sense that the width of the interval may be quite large).

Before discussing how to construct an confidence interval for the mean when the variance is unknown, we first give some results that are useful for deriving such confidence interval.

Using this proposition, we can prove the following theorem. Again, before discussing this confidence interval, let us introduce a notation:
 * $$t_{\alpha,\nu}$$ is the upper percentile of $$t_{\nu}$$ at level $$\alpha$$, i.e. it satisfies $$\mathbb P(T\ge t_{\alpha,\nu})=\alpha$$ where $$T\sim t_{\nu}$$.

Difference in means of two normal distributions
Sometimes, apart from estimating mean of a normal distribution, we would like to estimate the  in means of  normal distributions for making comparison. For example, apart from estimating the mean amount of time (lifetime) for a bulb until it burns out, we are often interested in estimating the between life of two different bulbs, so that we know which of the bulbs will last longer in average, and then we know that bulb has a higher "quality".

First, let us discuss the case where the two normal distributions are.

Now, the problem is that how should we construct a confidence interval for the in two means. It seems that we can just construct two $$1-\alpha$$ confidence intervals $$[L(\mathbf X),U(\mathbf X)],[L(\mathbf Y),U(\mathbf Y)]$$ for each of the two means $$\mu_X,\mu_Y$$ respectively. Then, the $$1-\alpha$$ confidence interval for $$\mu_X-\mu_Y$$ is $$[L(\mathbf X)-L(\mathbf Y),U(\mathbf X)-U(\mathbf Y)]$$. However, this is indeed incorrect since when we have $$\mathbb P(L(\mathbf X)\le \mu_X\le U(\mathbf X))=1-\alpha$$ and $$\mathbb P(L(\mathbf Y)\le \mu_Y\le U(\mathbf Y))=1-\alpha$$, it does mean that $$\mathbb P(L(\mathbf X)-L(\mathbf Y)\le \mu_X-\mu_Y\le U(\mathbf X)-U(\mathbf Y))$$ (there are no results in probability that justify this).

On the other hand, it seems that since $$\{L(\mathbf X)\le \mu_X\le U(\mathbf X)\}$$ and $$\{L(\mathbf Y)\le\mu_Y\le U(\mathbf Y)\}$$ are independent (since the normal distributions we are considering are independent), then we have $$\mathbb P(L(\mathbf X)\le \mu_X\le U(\mathbf X)\text{ and }L(\mathbf Y)\le \mu_Y\le U(\mathbf Y))=(1-\alpha)^2.$$ Then, when $$L(\mathbf X)\le \mu_X\le U(\mathbf X)$$ and $$L(\mathbf Y)\le \mu_Y\le U(\mathbf Y)$$, we have $$L(\mathbf X)-U(\mathbf Y)\le\mu_X-\mu_Y\le U(\mathbf X)-L(\mathbf Y),$$ so $$\mathbb P(L(\mathbf X)-U(\mathbf Y)\le\mu_X-\mu_Y\le U(\mathbf X)-L(\mathbf Y))=(1-\alpha)^2,$$ which means $$[L(\mathbf X)-U(\mathbf Y),U(\mathbf X)-L(\mathbf Y)]$$ is a $$(1-\alpha)^2$$ confidence interval.

However, this is actually also incorrect. The flaw is that "when $$L(\mathbf X)\le \mu_X\le U(\mathbf X)$$ and $$L(\mathbf Y)\le \mu_Y\le U(\mathbf Y)$$, we have $$L(\mathbf X)-U(\mathbf Y)\le\mu_X-\mu_Y\le U(\mathbf X)-L(\mathbf Y)$$" only means $$\{L(\mathbf X)\le \mu_X\le U(\mathbf X)\text{ and }L(\mathbf Y)\le\mu_Y\le U(\mathbf Y)\}\subseteq \{L(\mathbf X)-U(\mathbf Y)\le \mu_X-\mu_Y\le U(\mathbf X)-L(\mathbf Y)\}$$ (we do not have the reverse subset inclusion in general). This in turn means $$(1-\alpha)^2=\mathbb P(L(\mathbf X)\le \mu_X\le U(\mathbf X)\text{ and }L(\mathbf Y)\le\mu_Y\le U(\mathbf Y)){\color{darkgreen}\le}\mathbb P(L(\mathbf X)-U(\mathbf Y)\le \mu_X-\mu_Y\le U(\mathbf X)-L(\mathbf Y)).$$ So, $$[L(\mathbf X)-U(\mathbf Y),U(\mathbf X)-L(\mathbf Y)]$$ is actually a $$(1-\alpha)^2$$ confidence interval (in general).

So, the above two "methods" to construct confidence intervals for difference in means of two independent normal distributions actually do not work. Indeed, we do use the confidence interval for each of the two means, which is constructed previously, to construct a confidence interval for difference in the two means. Instead, we consider a of the difference in the two means, which is a standard way for constructing confidence intervals.

Now, we will prove the above theorem based on the result shown in the previous exercise:

Now, we will consider the case where the variances are. In this case, the construction of the confidence interval for the difference in means is more complicated, and even more complicated when $$\sigma_X^2\ne \sigma_Y^2$$. Thus, we will only discuss the case where $$\sigma_X^2=\sigma_Y^2$$ is unknown. As you may expect, we will also use some results mentioned previously for constructing confidence interval for $$\mu$$ when $$\sigma^2$$ is unknown in this case.

Now, what if the two normal distributions concerned are ? Clearly, we cannot use the above results anymore, and we need to develop a new method to construct a confidence interval for the difference of means in this case. In this case, we need to consider the notion of.

Variance of a normal distribution
After discussing the confidence intervals for means of normal distributions, let us consider the confidence intervals for of normal distributions. Similarly, we need to consider a pivotal quantity of $$\sigma^2$$. Can you suggest a pivotal quantity of $$\sigma^2$$, based on a previous result discussed?

Recall that we have $$\frac{nS^2}{\sigma^2}\sim\chi^2_{n-1}$$, and $$\chi^2_{n-1}$$ is independent from $$\sigma^2$$ with some suitable assumptions. Thus, this result gives us a pivotal quantity of $$\sigma^2$$, namely $$\frac{nS^2}{\sigma^2}$$. Before discussing the theorem for constructing a confidence interval for $$\sigma^2$$. Let us introduce a notation: Some values of $$\chi^2_{\alpha,\nu}$$ can be found in the chi-squared table.
 * $$\chi^2_{\alpha,\nu}$$ is the upper percentile of $$\chi^2_{\nu}$$ at level $$\alpha$$, i.e. it satisfies $$\mathbb P(X\ge \chi^2_{\alpha,\nu})=\alpha$$ where $$X\sim\chi^2_{\nu}$$.
 * To find the value of $$\chi^2_{\alpha,\nu}$$, locate the row for $$\nu$$ degrees of freedom and the column for "probability content" $$\alpha$$.

Ratio of variances of two independent normal distributions
Similar to the case for means, we would also sometimes like to compare the variances of two normal distributions. One may naturally expect that we should construct a confidence interval for in variances, similar to the case for means. However, there are simple ways to do this, since we do not have some results that help with this construction. Therefore, we need to consider an alternative way to the variances, without using the  in variances. Can you suggest a way?

Recall the definition of in point estimation. gives us a nice way to compare two variances without considering their difference, where the of two variances is considered. Fortunately, we have some results that help us to construct a confidence interval for the of two variances.

Recall that the definition of $$F$$-distribution: if $$U\sim\chi^2_$$ and $$V\sim\chi^2_$$ are independent, then $$\frac{U/{\color{red}\nu_1}}{V/{\color{blue}\nu_2}}$$ follows the $$F$$-distribution with $${\color{red}\nu_1}$$ and $${\color{blue}\nu_2}$$ degrees of freedom, denoted by $$F_{{\color{red}\nu_1},{\color{blue}\nu_2}}$$. From the definition of $$F$$-distribution, we can see that it involves a ratio of two independent chi-squared random variables. How can it be linked to the ratio of two variances?

Recall that we have $$\frac{nS^2}{\sigma^2}\sim\chi^2_{n-1}$$ with some suitable assumptions. This connects the variance with the chi-squared random variable, and thus we can use this property together with the definition of $$F$$-distribution to construct a pivotal quantity, and hence a confidence interval.

Let us introduce a notation before discussing the construction of confidence interval: Some values of $$F_{\alpha,\nu_1,\nu_2}$$ can be found in $F$-tables (there is different $$F$$-tables for different values of $$\alpha$$, and the row and column of each table indicates the first and second degrees of freedom respectively). Also, using the property that $$F_{\alpha,\nu_1,\nu_2}=\frac{1}{F_{1-\alpha,\nu_2,\nu_1}}$$, we can obtain some more values of $$F_{\alpha,\nu_1,\nu_2}$$ which are not included in the $$F$$-tables.
 * $$F_{\alpha,\nu_1,\nu_2}$$ is the upper percentile of $$F_{\nu_1,\nu_2}$$ at level $$\alpha$$, i.e. it satisfies $$\mathbb P(X\ge F_{\alpha,\nu_1,\nu_2})=\alpha$$.

Apart from using this confidence interval to variances (or standard deviations), it can also be useful to  some assumptions about variances. Let us illustrate these two usages in the following examples.

Approximated confidence intervals for means
Previously, the distributions for the population are assumed to be normal, but the distributions are often normal in reality. So, does it mean our previous discussions are meaningless in reality? No. The discussions are indeed still quite meaningful in reality, since we can use the to "connect" the distributions in reality (which are usually not normal) to normal distribution. Through this, we can construct confidence intervals, since we use central limit theorem for approximation.

To be more precise, recall that the suggests that $$\frac{\overline X-\mu}{\sigma/\sqrt n}\;\overset{d}\to\;Z\sim\mathcal N(0,1)$$ with some suitable assumptions. Therefore, if the sample size $$n$$ is large enough (a rule of thumb: at least 30), then $$\frac{\overline X-\mu}{\sigma/\sqrt n}$$ follows standard normal distribution. Hence, $$\frac{\overline X-\mu}{\sigma/\sqrt n}$$ is a pivotal quantity (approximately). Recall from the property of normal distribution that if $$X_1,\dotsc,X_n$$ is a random sample from $$\mathcal N(\mu,\sigma^2)$$, then we have $$\frac{\overline X-\mu}{\sigma/\sqrt n}\sim\mathcal N(0,1)$$ (not approximately), and we have used this for the pivotal quantity for the confidence interval for mean when variance is known, and also the confidence interval for $$\mu_X-\mu_Y$$ when $$\sigma_D^2$$ is known. Therefore, we can just use basically the same confidence interval in these cases, but we need to notice that such confidence intervals are approximated, but not exact since we have used the central limit theorem for constructing the pivot quantity.

Now, how about the other confidence intervals where the pivotal quantity is "not in this form"? In the confidence interval for difference in means when variance is unknown, the pivotal quantity is similar in some sense: $$\frac{(\overline X-\overline Y)-(\mu_X-\mu_Y)}{\sqrt{\sigma_X^2/n+\sigma^2_Y/m}}\sim\mathcal N(0,1)$$ (see the corresponding theorem for the meaning of the notations involved). Can we use the central limit theorem to conclude that when the distributions involved are not normal (but are still independent), and the sample sizes $$n$$ and $$m$$ are both large enough, then $$\frac{(\overline X-\overline Y)-(\mu_X-\mu_Y)}{\sqrt{\sigma_X^2/n+\sigma^2_Y/m}}\sim\mathcal N(0,1)$$ ? The answer is yes. For the proof, see the following exercise.

As a result, we know that we can again just use basically the same confidence interval in this case, but of course such confidence interval is approximated.

There are still some confidence intervals that are not considered yet. Let us first consider the confidence interval for mean when the variance is.

Recall that we have mentioned that we can simply replace the "$$\sigma$$" by "$$S$$" according to the weak law of large number, which is quite intuitive. But why can we do this? Consider the following theorem.

So far, we have not discussed how to construct an approximated confidence interval for $$\mu_X-\mu_Y$$ when $$\sigma_X^2=\sigma_Y^2=\sigma^2$$ is unknown, as well as approximated confidence intervals of variances. Since the pivotal quantities used are constructed according to some results that are exclusive to normal distributions, they all do work when the distributions involved are not normal. Therefore, there are no simple ways to perform such constructions.

The following table summarizes the approximated $$1-\alpha$$ confidence intervals in different cases: $$ \begin{array}{c|c|c} \big(\text{approximated }(1-\alpha)\text{ confidence intervals}\big)&\text{mean}&\text{difference in means}\\ \hline \text{known variance}&\left[\overline X-z_{\alpha/2}\frac{\sigma}{\sqrt n},\overline X+z_{\alpha/2}\frac{\sigma}{\sqrt n}\right]&\left[(\overline X-\overline Y)-z_{\alpha/2}\sqrt{\frac{\sigma_X^2}{n}+\frac{\sigma_Y^2}{m}},(\overline X-\overline Y)+z_{\alpha/2}\sqrt{\frac{\sigma_X^2}{n}+\frac{\sigma_Y^2}{m}}\right]\text{ OR }\left[\overline D-z_{\alpha/2}\frac{\sigma_D}{\sqrt n},\overline D+z_{\alpha/2}\frac{\sigma_D}{\sqrt n}\right](\text{paired samples})\\ \hline \text{unknown variance}&\left[\overline X-z_{\alpha/2}\frac{S}{\sqrt n},\overline X+z_{\alpha/2}\frac{S}{\sqrt n}\right]&\left[\overline D-z_{\alpha/2}\frac{S_D}{\sqrt n},\overline D+z_{\alpha/2}\frac{S_D}{\sqrt n}\right](\text{paired samples})\\ \end{array} $$

Let us consider an application of the approximated confidence intervals.