User:Gkhan/Matrix Multiplication DP version

Defining the problem
Say that you need to multiply a series of n matrices together and form a product P.


 * $$M_1 M_2 M_3 \cdots M_{n-1} M_n = P$$

What is the fastest way we can form this product? Even though matrix-multiplication is not commutative it is associative. I.e.


 * $$(M_1 M_2) M_3 = M_1 (M_2 M_3)$$
 * $$M_1 M_2 \ne M_2 M_1$$

This means we can decide in what way to parenthesize a matrix product but we cannot change the order which we multiply them. Since you can only multiply two matrices at a time the product $$M_1 M_2 M_3 M_4$$ can be paranthesized in these ways:


 * $$(((M_1 M_2) M_3) M_4)$$
 * $$((M_1 (M_2 M_3)) M_4)$$
 * $$(M_1 ((M_2 M_3) M_4))$$
 * $$((M_1 M_2) (M_3 M_4))$$
 * $$(M_1 (M_2 (M_3 M_4)))$$

Two matrixes $$M_1$$ and $$M_2$$ can be multiplied if the number of coulums in $$M_1$$ equals the number of rows in $$M_2$$. The number of rows in their product will equal the number rows in $$M_1$$ and the number of columns will equal the number of columns in $$M_2$$. I.e if the dimensions of $$M_1$$ is $$a \times b$$ and $$M_2$$ has dimensions $$b \times c$$ their product will have dimensions $$a \times c$$

Two multiply two matrices with eachother we will use a function called MatrixMultiply which takes two matrices and returns their product. We will leave implementation of this function alone for the moment as it is not the focus of this chapter (how to multiply two matrices in the fastest way has been under intensive study for several years and really merits it's own chapter). The time this function takes two multiply two matrices of size $$a \times b$$ and $$b \times a$$ is essentially proportional to the number of scalar multiplications which is proportional to $$a b c$$. This means that the paranthezation matters. Say that we have three matrices $$M_1$$, $$M_2$$ and $$M_3$$. $$M_1$$ has dimensions $$5 \times 100$$, $$M_2$$ has dimensions $$100 \times 100$$ and $$M_3$$ has dimensions $$100 \times 50$$. Let's parantezise them in the two possible ways and see which way requires the least amount of multiplications. The two ways are


 * $$((M_1 M_2) M_3)$$
 * $$(M_1 (M_2 M_3))$$

To form the product in the first way requires 75000 scalar multiplications (5*100*100=50000 to form product $$(M_1 M_2)$$ and another 5*100*50=25000 for the last multiplications.) This might seem like alot, but in comparison to the 525000 scalar multiplications required by the second parenthesization (50*100*100=500000 plus 5*50*100=25000) it is miniscule! One understands why such an algorithm is important, imagine what would happen if we needed to multiply 50 matrices!

Forming a recursive solution
Note that in this discussion we will concentrate on finding a how many scalar multiplications are needed instead of the actual order. This is because once we have found a working algorithm to find the amount it is trivial to create an algorithm for the actual paranthezation. It will, however, be discussed in the end.

So how would an algorithm for the optimum paranthezation look? By the chapter title you might expect that a dynamic programming method is in order (not to give the answer away or anything). So how would a dynamic programming method work? Since dynamic programming algorithms are based on optimal substructure, what would the optimal substructure in this problem be?

Suppose that the optimal way parenthesize
 * $$M_1 M_2 \dots M_n$$

splits the product at k, i.e
 * $$(M_1 M_2 \dots M_k)(M_{k+1} M_{k+2} \dots M_n)$$

Then the optimal solution contains the optimal solutions to the two subproblems
 * $$(M_1 \dots M_k)$$
 * $$(M_{k+1} \dots M_n)$$

I.e, just in accordance with the fundamental principle of dynamic programming, the solution to the problem depends on the solution of smaller sub-problems.

Let's say that it takes $$c(n)$$ number of scalar multiplications to multiply matrices $$M_n$$ and $$M_{n+1}$$ and $$f(m,n)$$ is number of scalar multiplications to be performed in an optimal paranthesation of the matrices $$M_m \dots M_n$$. The definition or $$f(m,n)$$ is the first step towards a solution.

When $$m-n=1$$, the formulation is trivial, it is just $$c(n)$$. But what is it when the distance is larger? Using the observation above we can derive a formulation. Suppose an optimal solution to the problem divides the matrices at matrices k and k+1 (i.e $$(M_m \dots M_k)(M_{k+1} \dots M_n)$$) then the number of scalar multiplications are.


 * $$f(m,k) + f(k+1,n) + c(k)$$

That is, the amount of time to form the first product, the amount of time it takes to form the second product and the amount of time it takes to multiply them together. But what is this optimal value k? The answer is, ofcourse, the value that makes the above formula assume the minimum value. We can thus form the complete definition for the function:


 * $$f(m,n) = \begin{cases} \min_{m < k < n}f(m,k) + f(k+1,n) + c(k) & \mbox{if } n-m>1 \\ c(m) & \mbox{if }n-m=1 \end{cases}$$

A straight-forward recursive solution to this would look something like this (the language is Wikicode):

function f(m, n) { if m-n = 1 return c(m) minCost = $$\infty$$ for k from m + 1 to n - 1 { v = f(m, k) + f(k + 1, n) + c(k) if v < minCost minCost = v    } }

This rather simple solution is, unfortunatly, not a very good one. It spends mountains of time recomputing data and its running time is exponential.

[TODO: write an analysis of the straight-forward-recursion method]

Using Memoization
It is straightforward to convert the recursive solution into one that uses memoization. You just have to modify the code so that results are stored so no values need to be recomputed.

int[,] x; function f(m, n) { if x[m][n] != null return x[m][n] if m == n        x[m][n] = 0 return 0 if m-n = 1 { x[m][n] = c(m) return c(m) }    minCost = $$\infty$$ for k from m to n - 1 { v = f(m, k) + f(k + 1, n) + c(k) if v < minCost minCost = v    } x[m][n] = minCost; return minCost }

As you can see, there is very little modification needed to the original code. This is a huge optimization, down to a polynomial-time algorithm.

''[What is the exact runtime? My qualified guess would be O(n^2*log n) but i haven't analyzed it properly.]''

Using Dynamic Programming
So how do we make it into a full-fledged bottom-up dynamic progragramming algorithm? To illustrate that point, lets use an example. Suppose we have 6 matrices with the following dimensions:


 * $$M_1 : 6 \times 15$$
 * $$M_2 : 15 \times 8$$
 * $$M_3 : 8 \times 1$$
 * $$M_4 : 1 \times 17$$
 * $$M_5 : 17 \times 7$$
 * $$M_6 : 7 \times 15$$

This will make the cost-function c(n) assume the following values:


 * $$c(n) = \begin{cases}

720 & n=1 \\ 120 & n=2 \\ 136 & n=3 \\ 119 & n=4 \\ 1785 & n=5 \end{cases} $$

The matrix which will hold the optimum values (called x in the memozation example) can be seen like this:



This is only half of the entire array, but since the values m>n are never used it would be redundant to display them. The reason that it is tilted to the side is that it conveinently shows the how the process "builds" the result up from the bottom. So, what can we do to this matrix. If we look at the definition of f(m, n) it says that for all the values which m=n the value in the array is zero. Let's fill in those values:



Allright! Let's boldly continue. The row just above the one we filled in contains all the values with adjacent indexes (i.e 1,2 2,3 3,4 4,5 and 5,6). Those values are just the values of the cost-function, so let's fill in those too.



Now we continue to next row and just use the definition to fill in the blanks. What, for instance should be in 1,3? Just use the definition: