Statistics/Preliminaries

This chapter discusses some preliminary knowledge (related to statistics) for the following chapters in the advanced part.

Empirical distribution
Since all these $$n$$ random variables follow the same cdf as $$X$$, we may expect their distribution should be somewhat similar to the distribution of $$X$$, and indeed, this is true. Before showing how this is true, we need to define "the distribution of these $$n$$ random variables" more precisely, as follows:

We have mentioned how we can approximate the cdf, and now we would like to estimate the /. Let us first discuss how to estimate the pmf.

For the discrete random variable $$X$$, from the empirical cdf, we know that each $$X_1,\dotsc,X_n$$ is "assigned" with the probability $$1/n$$. Also, considering the previous example, the empirical pmf is $$f_n(x)=\frac{\sum_{k=1}^{n}\mathbf 1\{X_k=x\}}{n}$$.

To discuss the estimation of pdf of continuous random variable, we need to define first.

For the continuous random variable $$X$$, construct class intervals for $$X$$ which are a non-overlapped partition of the interval $$[X_{\text{min}},X_{\text{max}}]$$, in which $$X_{\text{min}}$$ and $$X_{\text{max}}$$ are the minimum and maximum values in the sample. Then, the pdf $$f(x)\approx\frac{F(c_j)-F(c_{j-1})}{c_j-c_{j-1}},\quad x\in(c_{j-1},c_j]\text{ and }j=1,2,\dotsc,i,$$ when $$c_{j-1}$$ and $$c_j$$ are close, i.e. the length of each class interval is small. (Although the union of the above class intervals is $$(c_0,c_i]$$ and thus the value $$c_0$$ is not included in the interval, it does not matter since the value of the pdf at $$c_0$$ does not affect the calculation of probability.) Here, $$c_0$$ is $$X_\text{min}$$ and $$c_i$$ is $$X_\text{max}$$.

Since $$F(c_j)-F(c_{j-1})=\mathbb P(X\in(c_{j-1},c_j])\approx{\color{darkgreen}\frac{\sum_{k=1}^{n}\mathbf 1\{X_k\in(c_{j-1},c_j]\}}{n}}$$ is the relative frequency of occurrences of the event $$\{X_k\in (c_{j-1},c_j]\}$$, we can rewrite the above expression as $$f(x)\approx h_n(x)=\frac{{\color{darkgreen}n}(c_j-c_{j-1})},\quad x\in(c_{j-1},c_j]\text{ and }j=1,2,\dotsc,i$$ in which $$h_n(x)$$ is called the.

Since there are many possible ways to construct the class intervals, the value of $$h_n(x)$$ can differ even with the same $$n$$ and $$x$$. When $$n$$ is and the length of each class interval is, we will expect $$h_n(x)$$ to be a good estimate of $$f(x)$$ (the theoretical pdf).

There are some properties related to the relative frequency histogram, as follows:

Expectation
In this section, we will discuss some results about expectation, which involve some sort of inequalities. Let $$a$$ and $$b$$ be constants. Also, let $$\Omega$$ be the sample space of $$X$$.

Convergence
Before discussing convergence, we will define some terms that will be used later.

In a, say $$x_1,\dotsc,x_n$$, we observe values of their sample mean, $$\overline x=\frac{\sum_{i=1}^{n}x_i}{n}$$, and sample variance, $$s^2=\frac{\sum_{i=1}^{n}(x_i-\overline x)^2}{n}$$. , each of the values is only realization of the respective random variables $$\overline X$$ and $$S^2$$. We should notice the difference between these definite values (not random variables) and the statistics (random variables).

To explain the definitions of the sample mean $$\overline X$$ and sample variance $$S^2$$ more intuitively, consider the following.

Recall that the empirical cdf $$F_n(x)$$ assigns probability $$\frac{1}{n}$$ to each of the random sample $$X_1,\dotsc,X_n$$. Thus, by the definition of mean and variance, the of a random variable, say $$Y$$, with this cdf $$F_n(x)$$ (and hence with the corresponding pmf $$f_n(x)$$) is $$\sum_{i=1}^{n}\left(X_i\cdot\frac{1}{n}\right)=\overline X$$. Similarly, the of $$Y$$ is $$\sum_{i=1}^{n}\left((X_i-\overline X)^2\cdot\frac{1}{n}\right)=S^2$$. In other words, the and  of the empirical distribution, which corresponds to the, is the  $$\overline X$$ and the  $$S^2$$ respectively, which is quite natural, right?

Also, recall that the empirical cdf $$F_n(x)$$ can well approximate the cdf of $$X$$, $$F(x)$$ when $$n$$ is large. Since $$\overline X$$ and $$S^2$$ are the mean and variance of a random variable with cdf $$F_n(x)$$ it is natural to expect that $$\overline X$$ and $$S^2$$ can well approximate the mean and variance of $$X$$.

Convergence in probability
The following theorem, namely, is an important theorem which is related to convergence in probability.

There are also some properties of convergence in probability that help us to determine a complex expression converges to what thing.

Convergence in distribution
A very important theorem in statistics which is related to convergence in distribution is.

There are some properties of convergence in distribution, but they are a bit different from the properties of convergence in probability. These properties are given by, and also continuous mapping theorem.

Resampling
By, we mean creating new samples based on an existing sample. Now, let us consider the following for a general overview of the procedure of resampling.

Suppose $$X_1,\dotsc,X_n$$ is a from a distribution of a random variable $$X$$ with cdf, $$F(x)$$. Let $$x_1,\dotsc,x_n$$ be a corresponding of the random sample $$X_1,\dotsc,X_n$$. Based on this realization, we have also a of the empirical cdf: $$\frac{1}{n}\sum_{k=1}^{n}\mathbf 1\{x_k\le x\}$$ . Since this is a realization of the empirical cdf, by Glivenko-Cantelli theorem, it is a good estimate of the cdf $$F(x)$$ when $$n$$ is large. In other words, if we denote the random variable with the same pdf as that of the empirical cdf by $$X^*$$, $$X^*$$ and $$X$$ have similar distributions when $$n$$ is large.

Notice that a realization of empirical cdf is a cdf (since the support $$x_1,\dotsc,x_n$$ is countable). We now draw a (called the bootstrap (or resampling) random sample) with sample size $$B$$ (called the ) $$X_1^*,\dotsc,X_B^*$$ from the distribution of a random variable $$X^*$$ ($$X^*$$ comes from from $$X$$, so the behaviour of sampling from $$X^*$$ is called ).

Then, the relative frequency historgram of $$X_1^*,\dotsc,X_B^*$$ should be close to that of the corresponding of the empirical pmf of $$X^*$$ (found from the realization of the empirical cdf of $$X^*$$), which is close to pdf $$f(x)$$ of $$X$$. This means the relative frequency historgram of $$X_1^*,\dotsc,X_B^*$$ is close to the pdf $$f(x)$$ of $$X$$.

In particular, since the cdf of $$X^*$$, $$F_n(x)$$, assigns probability $$1/n$$ to each of $$X^*_1,\dotsc,X^*_B$$ , the pmf of $$X^*$$ is $$\mathbb P(X^*=x_i)=\frac{1}{n},\quad i=1,2,\dotsc,n$$. Notice that this pmf is quite simple, and therefore it can make the related calculation about it simpler. For example, in the following, we want to know the distribution of $$T^*=g(X_1^*,\dotsc,X_n^*)$$, and this simple pmf can make the resulting distribution also quite simple.

In the following, we will discuss an application of the bootstrap method (or ) mentioned above, namely using bootstrap method to the distribution of a statistic $$T=g(X_1,X_2,\dotsc,X_n)$$ (the inputs of the functions are random variables and $$g$$ is a function). The reason for approximating, rather than finding the distribution exactly, is that the latter is usually infeasible (or may be too complicated).

To do this, consider the "bootstrapped statistic" $$T^*=g(X_1^*,X_2^*,\dotsc,X_n^*)$$ and the statistic $$T=g(X_1,X_2,\dotsc,X_n)$$. $$X_1^*,X_2^*,\dotsc,X_n^*$$ is the bootstrap random sample (with bootstrap sample size $$n$$) from the distribution of $$X^*$$ and $$X_1,X_2,\dotsc,X_n$$ is the random sample from the distribution of $$X^*$$. When $$n$$ is large, since the distribution of $$X^*$$ is similar to that of $$X$$, the bootstrap random sample $$X_1^*,X_2^*,\dotsc,X_B^*$$ and the random sample $$X_1,X_2,\dotsc,X_n$$ are also similar. It follows that $$T^*$$ and $$T$$ are similar as well, or to be more precise, the of $$T^*$$ and $$T$$ are close. As a result, we can utilize the distribution of $$T^*$$ (which is easier to find and simpler, since the pmf of $$X^*$$ is simple as in above) to approximate the distribution of $$T$$. A procedure to do this is as follows: This histogram of the $$j$$ realizations (which are a realization of a random sample from $$T^*$$ with sample size $$j$$) is close to the pmf of $$T^*$$, and thus close to the pmf of $$T$$.
 * 1) Generate a  $$x_1^*,x_2^*,\dotsc,x_n^*$$ from the  $$X_1^*,X_2^*,\dotsc,X_n^*$$, which is from the distribution of $$X^*$$.
 * 2) Calculate a realization of the bootstrapped statistic $$T^*$$, $$t^*=g(x_1^*,x_2^*,\dotsc,x_n^*)$$.
 * 3) Repeat 1. to 2. $$j$$ times to get a sequence of $$j$$ realizations of $$T^*$$: $$t_1^*,t_2^*,\dotsc,t_j^*$$.
 * 4) Plot the relative frequency historgram of the $$j$$ realizations $$t^*_1,t^*_2,\dotsc,t^*_j$$.