Probability/Important Distributions

Motivation
Consider $${\color{blue}n}$$ independent Bernoulli trials with the same success probability $${\color{darkgreen}p}$$. We would like to calculate to probability $$\mathbb P(\{{\color{darkgreen}r}\text{ successes in }{\color{blue}n}\text{ trials}\})$$.

Let $$S_i$$ be the event $$\{i\text{th Bernoulli trial is a success}\},\quad i=1,2,\dotsc$$, as in the previous section. Let's consider a particular sequence of outcomes such that there are $${\color{darkgreen}r}$$ successes in $${\color{blue}n}$$ trials: $${\color{darkgreen}\underbrace{S\cdots S}_{r\text{ successes}}}{\color{red}\overbrace{F\cdots F}^{{\color{blue}n}-{\color{darkgreen}r}\text{ failures}}}$$ Its probability is $$\mathbb P({\color{darkgreen}S_1\cap\dotsb S_r}\cap {\color{red}S^c_{r+1}\cap\dotsb\cap S^c_{\color{blue}n}})\overset{\text{ indpt. }}{=} {\color{darkgreen}\mathbb P(S_1)\dotsb\mathbb P(S_r)}{\color{red}\mathbb P(S_{r+1}^c)\cdots\mathbb P(S_^c)} ={\color{darkgreen}p^r}{\color{red}(1-{\color{darkgreen}p})^{{\color{blue}n}-{\color{darkgreen}r}}}$$ Since the probability of other sequences with some of $${\color{darkgreen}r}$$ successes occurring in other trials is the , and there are $$\binom$$ distinct possible sequences , $$\mathbb P(\{{\color{darkgreen}r}\text{ successes in }{\color{blue}n}\text{ trials}\})=\binom{\color{darkgreen}p}^{\color{darkgreen}r}{\color{red}(1-{\color{darkgreen}p})^{{\color{blue}n}-{\color{darkgreen}r}}}.$$ This is the pmf of a random variable following the.

Bernoulli distribution
Bernoulli distribution is simply a special case of distribution, as follows:

Motivation
The Poisson distribution can be viewed as the 'limit case' for the binomial distribution.

Consider $${\color{blue}n}$$ independent Bernoulli trials with success probability $${\color{darkgreen}p}=\lambda/{\color{blue}n}$$. By the binomial distribution, $$\mathbb P({\color{darkgreen}r}\text{ successes in }{\color{blue}n}\text{ trials})=\binom{\color{darkgreen}(\lambda/{\color{blue}n})^r}{\color{red}(1-\lambda/{\color{blue}n})^{{\color{blue}n}-{\color{darkgreen}r}}}.$$

After that, consider an unit time interval, with (positive) $$\lambda$$ of a rare event (i.e. the  of number of occurrence of the rare event is $$\lambda$$). We can divide the unit time interval to $${\color{blue}n}$$ time subintervals of time length $$1/{\color{blue}n}$$ each. If $${\color{blue}n}$$ is and $${\color{darkgreen}p}$$ is, such that the probability for occurrence of two or more at a single time interval is negligible, then the probability for occurrence of for each time subinterval is $${\color{darkgreen}p}=\lambda/{\color{blue}n}$$ by definition of mean. Then, we can view the unit time interval as a sequence of $${\color{blue}n}$$ Bernoulli trials with success probability $${\color{darkgreen}p}=\lambda/{\color{blue}n}$$. After that, we can use $$\operatorname{Binom}{({\color{blue}n},\lambda/{\color{blue}n})}$$ to model the number of occurrences of. To be more precise, $$ \begin{align} \mathbb P(\underbrace{{\color{darkgreen}r}\text{ successes in }{\color{blue}n}\text{ trials}}_{{\color{darkgreen}r}\text{ rare events in the unit time}}) &=\binom{\color{darkgreen}(\lambda/{\color{blue}n})^r}{\color{red}(1-\lambda/{\color{blue}n})^{{\color{blue}n}-{\color{darkgreen}r}}}\\ &=\frac{{\color{blue}n}({\color{blue}n}-1)\dotsb({\color{blue}n}-{\color{darkgreen}r}+1)}{{\color{darkgreen}r}!}(\lambda^{\color{darkgreen}r}/{\color{blue}n}^{\color{darkgreen}r})(1-\lambda/{\color{blue}n})^{{\color{blue}n}-{\color{darkgreen}r}}\\ &=(\lambda^{\color{darkgreen}r}/{\color{darkgreen}r}!)\overbrace{(1-\underbrace{1/{\color{blue}n}}_{\to 0\text{ as }n\to\infty})\dotsb\big(1-\underbrace{({\color{darkgreen}r-1})/{\color{blue}n}}_{\to 0\text{ as }n\to\infty}\big)}^{\to 1\text{ as }n\to\infty}\underbrace{(1-\lambda/{\color{blue}n})^{\overbrace{{\color{blue}n}-{\color{darkgreen}r}}^{\to n\text{ as }n\to\infty}}}_{\to e^{-\lambda}\text{ as }n\to\infty}\\ &\to e^{-\lambda}\lambda^/{\color{darkgreen}r}!\text{ as }n\to\infty. \end{align} $$ This is the pmf of a random variable following the , and this result is known as the (or law of rare events). We will introduce it formally after introducing the definition of.

Motivation
Consider a sequence of independent Bernoulli trials with success probability $${\color{darkgreen}p}$$. We would like to calculate the probability $$\mathbb P(\{{\color{red}x}\text{ failures before first success}\})$$. By considering this sequence of outcomes: $${\color{red}\underbrace{F\cdots F}_{{\color{red}x}\text{ failures}}}{\color{darkgreen}S},$$ we can calculate that $$\mathbb P(\{{\color{red}x}\text{ failures before first success}\})={\color{red}(1-{\color{darkgreen}p})^x}{\color{darkgreen}p},\quad{\color{red}x}\in\operatorname{supp}(X)=\{0,1,2,\dotsc\}$$ This is the pmf of a random variable following the.

Motivation
Consider a sequence of independent Bernoulli trials with success probability $${\color{darkgreen}p}$$. We would like to calculate the probability $$\mathbb P(\{{\color{red}x}\text{ failures before }{\color{darkgreen}k}\text{th success}\})$$. By considering this sequence of outcomes: $$ \overbrace{ {\color{red}\underbrace{F\cdots F}_{x_1\text{ failures}}}{\color{darkgreen}S} {\color{red}\underbrace{F\cdots F}_{x_2\text{ failures}}}{\color{darkgreen}S} \cdots {\color{red}\underbrace{F\cdots F}_{x_k\text{ failures}}} }^{{\color{red}x}+{\color{darkgreen}k}-1\text{ trials}} {\color{darkgreen}\overbrace{S}^{k\text{th success}}}, \quad {\color{red}x_1}+{\color{red}x_2}+\dotsb+{\color{red}x_k}={\color{red}x}, $$ we can calculate that $$\mathbb P(\{{\color{red}x}\text{ failures before }{\color{darkgreen}k}\text{th success}\})={\color{red}(1-{\color{darkgreen}p})^x}{\color{darkgreen}p^k},\quad{\color{red}x}\in\operatorname{supp}(X)=\{0,1,2,\dotsc\}.$$ Since the probability of other sequences with some of $${\color{red}x}$$ failures occuring in other trials (and some of $${\color{darkgreen}k}-1$$ successes (excluding the $${\color{darkgreen}k}$$th success, which must occur in the last trial) occuring in other trials), is the, and there are $$\binom{{\color{red}x}+{\color{darkgreen}k}-1}$$ (or $$\binom{{\color{red}x}+{\color{darkgreen}k}-1}{{\color{green}k}-1}$$, which is the same numerically) distinct possible sequences , $$\mathbb P(\{{\color{red}x}\text{ failures before }{\color{darkgreen}k}\text{th success}\}) =\binom{{\color{red}x}+{\color{darkgreen}k}-1}{\color{red}(1-{\color{darkgreen}p})^x}{\color{darkgreen}p^k}, \quad{\color{red}x}\in\operatorname{supp}(X)=\{0,1,2,\dotsc\}.$$ This is the pmf of a random variable following the.

Motivation
Consider a sample of size $$n$$ are drawn without replacement from a population size $$N$$, containing $$K$$ objects of type 1 and $$N-K$$ of another type. Then, the probability $$\mathbb P(\{k\text{ type 1 objects are found when }n\text{ objects are drawn from }N\text{ objects}\})=\underbrace{\binom{K}{k}}_{\text{type 1}}\overbrace{\binom{N-K}{n-k}}^{\text{another type}}\bigg/\underbrace{\binom{N}{n}}_{\text{all outcomes}},\quad k\in\big\{\max\{n-N+K,0\},\dotsc,\min{\{K,n\}}\big\}$$ . This is the pmf of a random variable following the.
 * $$\binom{K}{k}$$: unordered selection of $$k$$ objects of type 1 from $$K$$ (distinguishable) objects of type 1 without replacement;
 * $$\binom{N-K}{n-k}$$: unordered selection of $$n-k$$ objects of another type from $$N-K$$ (distinguishable) objects of another type without replacement;
 * $$\binom{N}{n}$$: unordered selection of $$n$$ objects from $$N$$ (distinguishable) objects without replacement.

Finite discrete distribution
This type of distribution is a generalization of all discrete distribution with finite support, e.g. Bernoulli distribution and hypergeometric distribution.

Another special case of this type of distribution is, which is similar to the (will be discussed later).

Uniform distribution (continuous)
The is a model for 'no preference', i.e. all intervals of the same length on its support are (it can be seen from the pdf corresponding to continuous uniform distribution). There is also uniform distribution, but it is less important than  uniform distribution. So, from now on, simply 'uniform distribution' refers to the one, instead of the discrete one.

Exponential distribution
The distribution with  parameter $$\lambda$$ is often used to describe the  of rare events with rate $$\lambda$$.

Comparing this with the distribution, the  distribution describes the interarrival  of rare events, while distribution describes the  of occurrences of rare events within a fixed time interval.

By definition of, when the $$\uparrow$$, then  $$\downarrow$$ (i.e. frequency of the rare event $$\uparrow$$).

So, we would like the pdf to be more skewed to left when $$\lambda\uparrow$$(i.e. the pdf has higher value for small $$x$$ when $$\lambda\uparrow$$), so that areas under the pdf for intervals involving small value of $$x$$ $$\uparrow$$ when $$\lambda\uparrow$$.

Also, since with a fixed rate $$\lambda$$, the interarrival time should be less likely of higher value. So, intuitively, we would also like the pdf to be a strictly function, so that the probability involved (area under the pdf for some interval) $$\downarrow$$ when $$x\uparrow$$.

As we can see, the pdf of exponential distribution satisfies both of these properties.

Gamma distribution
distribution is a generalized distribution, in the sense that we can also change the  of the pdf of  distribution.

Beta distribution
distribution is a generalized $$\mathcal U[0,1]$$, in the sense that we can also change the of the pdf, using.

Cauchy distribution
The distribution is a  distribution. As a result, it is a 'pathological' distribution, in the sense that it has some counter-intuitive properties, e.g. undefined mean and variance, despite its mean and variance to be defined when we look at its graph directly.

Normal distribution (very important)
The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This is probably because sample means or sample sums often follow distributions by. As a result, the distribution is important in statistics.

Important distributions for statistics especially
The following distributions are important in statistics especially, and they are all related to normal distribution. We will introduce them briefly.

Chi-squared distribution
The distribution is a special case of Gamma distribution, and also related to  distribution.

Student's t-distribution
The is related to  distribution and  distribution.

F-distribution
The $$F$$-distribution is sort of a generalized Student's $$t$$-distribution, in the sense that it has one more changeable parameter for another degrees of freedom.

If you are interested in knowing how, , and are useful in statistics, then you may briefly look at, for instance, Statistics/Interval Estimation (applications in confidence interval construction) and Statistics/Hypothesis Testing (applications in hypothesis testing).

Motivation
Multinomial distribution is binomial distribution, in the sense that each trial has more than two outcomes.

Suppose $$n$$ objects are to be allocated to $$k$$ cells independently, for which each object is allocated to cell, with probability $$p_i$$ to be allocated to the $$i$$th cell ($$i=1,2,\dotsc,k$$). Let $$X_i$$ be the number of objects allocated to cell $$i$$. We would like to calculate the probability $$\mathbb P\big(\mathbf X\overset{\text{ def }}=(X_1,\dotsc,X_k)^T=\mathbf x\overset{\text{ def }}=(x_1,\dotsc,x_k)^T\big)$$, i.e. the probability that $$i$$th cell has $$x_i$$ objects.

We can regard each allocation as an independent trial with $$k$$ outcomes (since it can be allocated to one and only one of $$k$$ cells). We can recognize that the allocation of $$n$$ objects is partition of $$n$$ objects into $$k$$ groups. There are hence $$\binom{n}{x_1,\dotsc,x_k}$$ ways of allocation.

So, $$\mathbb P(\mathbf X=\mathbf x)=\binom{n}{x_1,\dotsc,x_k}p_1^{x_1}\dotsb p_k^{x_k}.$$ In particular, the probability of allocating $$x_i$$ objects to $$i$$th cell is $$p_i^{x_i}$$ by independence, and so that of a particular case of allocation of $$n$$ objects to $$k$$ cells is $$p_1^{x_1}\dotsb p_k^{x_k}$$ by independence.

Multivariate normal distribution
normal distribution is, as suggested by its name, a multivariate (and also generalized) version of the normal distribution (univariate).