Statistics/Multivariate Data Analysis

Multivariate Normal
The multivariate normal is just an extension of the normal distribution to the multivariate case. The simplest definition of the multivariate normal distribution can be given as follows:

At first glance, the definition seems rather abstract and esoteric. After all, the univariate normal distribution has a specific form of density and a specific characteristic function, both of which are mathematically valid characterisations of any probability distribution. However, this kind of definition is necessary to deal with the case where $$\Sigma$$ is not strictly positive definite. In the case where $$\Sigma$$ is positive definite, it can be shown via Gauss-Markov theorem that the density function of $$\mathbf{X},\ f_{\mathbf{X}}(\mathbf{x})=\frac{1}{\sqrt{2\pi}|\Sigma|^\frac{1}{2}}e^{-\frac{1}{2}(\mathbf{x}-\mu)^T\Sigma^{-1}(\mathbf{x}-\mu)}$$. However, this will not be true when $$\Sigma$$ is singular, as in that case the density function will not exist. But a definition based on the characteristic function will still work. A piecewise density function can still be derived based on the eigenvalues of $$\Sigma$$, but it is not a true density.

Matrix-variate Normal
We will first need to develop some notation. Let $$X_{m\times n}$$ be a matrix with columns $$c_{(1)}, c_{(2)},\ldots,c_{(n)}$$. Then we define the column vector $vec(X):=\begin{bmatrix} c_{(1)} \\ c_{(2)} \\ \vdots\\ c_{(n)} \end{bmatrix}$, and we call it the vectorisation of $$X$$.

The reader here should notice that this is simply imposing a normal distribution on the vectorisation of $$X$$. Thus, many of the results that are true for multivariate normal random vector will also be true for the vectorisation of matrix variate normal random variable.

Now that we have a definition of the multivariate and matrix-variate normal distribution, our next aim should be to find a similar analogue of the univariate $$\chi^2_{(p)}$$ distribution with $$p$$ degrees of freedom and Student's $$t$$ distribution, both of which are very closely related to the univariate normal distribution. We know that if $$X_i\sim\mathcal{N}(\mu_i,\sigma^2_i)\ \forall i\in\{1, 2,\ldots n\}$$ then $$\sum_{i=1}^n\frac{(X_i-\mu_i)^2}{\sigma^2_i}\sim\chi^2_{(n)}$$. What would be an analogue of this for the multivariate case?

Wishart Distribution
Although there does exist a form of density for the Wishart distribution, it is not necessary to prove most of the results we will require. An important thing to note, however, is that if $$S$$ follows a Wishart distribution, then $$\frac{\mathbf{a}^TS\mathbf{a}}{\mathbf{a}^T\Sigma \mathbf{a}}\sim\chi^2_{(n)}$$. This result can be easily proved by multiplying $$S$$ on the left and right by $$\mathbf{a}^T$$and $$\mathbf{a}$$, and then using the fact that $$\mathbf{a}^T\mathbf{X}\sim \mathcal{N}(\mathbf{a}^T\mu, \mathbf{a}^T\Sigma \mathbf{a})$$.

Methodology

 * 1) Principal Component Analysis
 * 2) Canonical Correlation Analysis