Data Mining Algorithms In R/Dimensionality Reduction/Singular Value Decomposition

In this chapter we will take a look at Singular Value Decomposition (SVD), a matrix's factorization method that uses the knowledge of Linear Algebra in order to make such decompositions.

Singular Value Decomposition
In data mining, this algorithm can be used to better understand a database by showing the number of important dimensions and also to simplify it, by reducing of the number of attributes that are used in a data mining process. This reduction removes unnecessary data that are linearly dependent in the point of view of Linear Algebra. For example, imagine a database which contains a field that stores the water's temperature on several samples and another that stores its state (solid,liquid or gas). Its easy to see that the second field is dependent from the first and, therefore, SVD could easily show us that it is not important for the analysis.

Principal Component Analysis (PCA) is a specific case of SVD.

Algorithm
SVD is the factorization of a matrix $$X$$ of real or complex numbers, that has $$n$$ rows and $$d$$ columns, into:

$$X = UAV^\top$$

where $$U$$ is a matrix whose dimensions are $$n \times n$$, $$V$$ is another matrix whose dimensions are $$d \times d$$, and $$A$$ is a matrix whose dimensions are $$n \times d$$, the same dimensions as $$X$$.

Besides:

$$U^\top U = I_n$$ and  $$V^\top V = I_d$$

where $$I_n$$ and $$I_d$$ are Identity matrix whose size are respectively $$n$$ and $$d$$.

The columns of $$U$$ are the left singular vectors of the matrix $$X$$, and the columns of $$V$$ (or the rows of $$V^\top$$) are the right singular vectors.

The matrix $$A$$ is a diagonal matrix, whose diagonal values are the singular values of the matrix $$X$$. The singular value in a row of $$A$$ is never less than the value of a row below. All singular values are greater than $$0$$.

To compute the SVD is to find the eigenvalues and the eigenvectors of $$XX^\top$$ and $$X^\top X$$. The eigenvectors of $$X^\top X$$ are the columns of $$V$$ and the eigenvectors of $$XX^\top$$ are the columns of $$U$$. The singular values of $$X$$, in the diagonal of matrix $$A$$, are the square root of the common positive eigenvalues of $$XX^\top$$ and $$X^\top X$$.

If $$XX^\top$$ and $$X^\top X$$ have the same number of eigenvalues, $$X$$ is a square matrix; else, the eigenvalues of the matrix that has less eigenvalues are eigenvalues of the matrix that has more eigenvalues. Therefore, the singular values of $$X$$ are the eigenvalues of the matrix, between $$XX^\top$$ and $$X^\top X$$, that has less eigenvalues.

The number of singular values of a matrix is the rank of that matrix, that is the number of linearly independent columns or rows of a matrix. The rank is not greater than $$\min(n, d)$$, because this is the number of elements of the diagonal of the matrix. The singular values are elements of the diagonal of the matrix $$A$$. The number of positive singular values equals the rank of the matrix.

Therefore, the algorithm is:


 * 1) Compute $$XX^\top$$ normally.
 * 2) Compute the eigenvalues and the eigenvectors of $$XX^\top$$ normally.
 * 3) Compute $$X^\top X$$.
 * 4) Compute the eigenvalues and the eigenvectors of $$X^\top X$$.
 * 5) Compute the square root of the common positive eigenvalues of $$XX^\top$$ and $$X^\top X$$.
 * 6) Finally, assign the computed values to $$U$$, $$V$$ and $$A$$.

Example
X = $$ \begin{bmatrix} 1 & 1 & 0 & 2\\ 1 & 1 & 2 & 0\\ 2 & 0 & 1 & 1 \end{bmatrix} $$

XXT = $$ \begin{bmatrix} 6 & 2 & 4\\ 2 & 6 & 4\\ 4 & 4 & 6 \end{bmatrix} $$

The eigenvalues of XXT are:
 * 12.744563, 4.000000, and 1.255437

The eigenvectors of XXT are:
 * 1) 0.5417743,  0.5417743,  0.6426206
 * 2) 0.7071068, -0.7071068, -2.144010 x 10-16
 * 3) 0.4544013,  0.4544013, -0.7661846

XTX = $$ \begin{bmatrix} 6 & 2 & 4 & 4\\ 2 & 2 & 2 & 2\\ 4 & 2 & 5 & 1\\ 4 & 2 & 1 & 5 \end{bmatrix} $$

The eigenvalues of XTX are:
 * 12.744563, 4.000000, 1.255437, and (5.940557 x 10-18)

The eigenvectors of XTX are:
 * 1) 0.6635353, 0.3035190,  0.4835272,  0.4835272
 * 2) 0.0000000, 0.5181041 x 10-16, -0.7071068,  0.7071068
 * 3) -0.5565257, 0.8110957, 0.1272850,  0.1272850
 * 4) 0.5000000, 0.5000000, -0.5000000, -0.5000000

The singular values of X are the square root of the common eigenvalues of XTX and XXT:
 * $$ \sqrt{12.744563} = 3.569953$$
 * $$ \sqrt{4.000000} = 2.000000$$
 * $$ \sqrt{1.255437} = 1.120463$$

Therefore:

A = $$ \begin{bmatrix} 3.569953 & 0       & 0        & 0\\ 0        & 2.000000 & 0        & 0\\ 0        & 0        & 1.120463 & 0 \end{bmatrix} $$

Finally, X is decomposed:

X = UAVT = $$ \begin{bmatrix} 0.5417743 & 0.7071068 &  0.4544013\\ 0.5417743 & -0.7071068 &  0.4544013\\ 0.6426206 & -2.144010\cdot10^{-16} & -0.7661846 \end{bmatrix} \begin{bmatrix} 3.569953 & 0 & 0 & 0\\ 0 & 2.000000 & 0 & 0\\ 0 & 0 & 1.120463 & 0 \end{bmatrix} \begin{bmatrix} 0.6635353 & 0.000000 &  -0.5565257 &  0.5\\ 0.3035190 &  5.181041\cdot10^{-16} &  0.8110957 &  0.5\\ 0.4835272 & -0.7071068 & 0.1272850 & -0.5\\ 0.4835272 &  0.7071068 &  0.1272850 & -0.5 \end{bmatrix} $$

Implementation
R has a built in function which calculates SVD, called 'svd'.

It, by default, receives a R's native matrix as argument and returns a frame, that contains U, A and V.

Arguments
If it is not necessary that all singular values and vectors are computed, one can tell svd the exact number of needed elements.

This can be achieved by assigning these values to nu and nv which represent the number of left and right singular vectors needed, respectively.

For example, in order to calculate only half of these vectors, one could do:

svd(X, nu = min(nrow(X), ncol(X)) / 2, nv = min(nrow(X), ncol(X))/2)

Returned object
s = svd(X) Considering:

X = UAVT

The returned object is a data structure that contains three fields:

s$d is the vector that contains the singular values of X, that was got from the diagonal of matrix A.

s$u is the matrix whose columns contain the left singular vectors of X. Its number of rows is the same number of rows of X and its number of columns is the number passed to the parameter nu. Note that if nu is 0, this matrix is not created.

s$v is the matrix whose columns contain the right singular vectors of X. Its number of rows is the same number of columns of X and its number of columns is the number passed to the parameter nv. Note that if nv is 0, this matrix is not created.

Examples
Execute the following command sequences in the R terminal or as a R program and see the results:

Example 1: dat <- seq(1,240,2)

X <- matrix(dat,ncol=12)

s <- svd(X)

A <- diag(s$d)

s$u %*% A %*% t(s$v) # X = U A V'

Example 2: dat <- seq(1,240,2)

X <- matrix(dat,ncol=12)

s <- svd(X, nu = nrow(X), nv = ncol(X))

A <- diag(s$d)

A <- cbind(A, 0) # Add two columns with zero, in order to A have the same dimensions of X.

A <- cbind(A, 0)

s$u %*% A %*% t(s$v) # X = U A V'

=Case Study=

Scenario
In order to better visualize the results of SVD, in this case study we will work with a matrix that represents an image, so any change on the matrix can be easily observed.

Input Data
To work with an image on R, one should install the package rimage:

> install.packages("rimage")

Let's then load the image into R, converting it to a greyscale one. This way, we will end up with an binary matrix, where 1 means white, and 0 black.

library(rimage) tux <- read.jpeg('tux.jpeg') tux <- imagematrix(tux,type='grey')

We can see the result of this import:

plot(tux)



In order to help us with this dimension reduction, lets make a little help function, which will receive our tux and the numbers of dimension we want and return our new tux.

reduce <- function(A,dim) { #Calculates the SVD sing <- svd(A)

#Approximate each result of SVD with the given dimension u<-as.matrix(sing$u[, 1:dim]) v<-as.matrix(sing$v[, 1:dim]) d<-as.matrix(diag(sing$d)[1:dim, 1:dim])

#Create the new approximated matrix return(imagematrix(u%*%d%*%t(v),type='grey')) }

Execution and output
Now that we have our matrix, let's see how many singular values it has:

tux_d <- svd(tux) length(tux_d$d) [1] 335

So we have 335 singular values on this matrix. Let's first try to reduce it to only one singular value:

plot(reduce(tux,1))



As we can see, this approximation removes a lot of useful information from our matrix. If it was a database, we surely would have lost important data.

Lets try with 10% (35) singular values:

plot(reduce(tux,35))



Analysis
With only 10% of the real data we are able to create a very good approximation of the real data. Moreover, with this method we can remove noises and linear dependent elements by using only the most important singular values. This is very useful on data mining, since is hard to identify if a database is clear and, if not, which elements are useful or not to our analysis.