Calculus/Derivatives of multivariate functions

The matrix of a linear transformation
A linear transformation $$L:\R^n\to\R^m$$ amounts to multiplication by a uniquely defined matrix; that is, there exists a unique matrix $$A\in\R^{m\times n}$$ such that
 * Theorem
 * $$\forall\vec v\in\R^n:L(\vec v)=A\vec v$$

We set the column vectors
 * Proof
 * $$\begin{pmatrix}a_{1,j}\\a_{2,j}\\\vdots\\a_{n,j}\end{pmatrix}:=L(\vec{e}_j)$$

where $$\{\vec{e}_1,\ldots,\vec{e}_n\}$$ is the standard basis of $$\R^n$$. Then we define from this
 * $$A:=\begin{pmatrix}a_{1,1}&\cdots &a_{1,n}\\\vdots&\ddots&\vdots\\a_{n,1}&\cdots &a_{n,n}\end{pmatrix}$$

and note that for any vector $$\vec v=(v_1,\ldots,v_n)^t$$ of $$\R^n$$ we obtain
 * $$A\vec v=A\left(\sum_{j=1}^n v_j\vec{e}_j\right)=\sum_{j=1}^n Av_j\vec{e}_j=\sum_{j=1}^n v_jL(\vec{e}_j)=L\left(\sum_{j=1}^n v_j\vec{e}_j\right)=L(\vec v)$$

Thus, we have shown existence. To prove uniqueness, suppose there were any other matrix $$B\in\R^{m\times n}$$ with the property that $$\forall\vec v\in\R^n:L(\vec v)=B\vec v$$. Then in particular,
 * $$B\vec e_j=L(\vec e_j)$$

which already implies that $$A=B$$ (since all the columns of both matrices are identical).

How to generalise the derivative
It is not immediately straightforward how one would generalize the derivative to higher dimensions. For, if we take the definition of the derivative at a point $$x_0$$
 * $$\lim_{h\to0}\frac{f(x_0+h)-f(x_0)}{h}$$

and insert vectors for $$h$$ and $$x_0$$, we would divide the whole thing by a vector. But this is not defined.

Hence, we shall rephrase the definition of the derivative a bit and cast it into a form where it can be generalized to higher dimensions.

Let $$f:\R\to\R$$ be a one-dimensional function and let $$x_0\in\R$$. Then $$f$$ is differentiable at $$x_0$$ if and only if there exists a linear function $$l:\R\to\R$$ such that
 * Theorem
 * $$\lim_{h\to0}\frac{\Big|f(x_0+h)-\big(f(x_0)+l(h)\big)\Big|}{|h|}=0$$

We note that according to the above, linear functions $$l:\R\to\R$$ are given by multiplication by a $$1\times1$$-matrix, that is, a scalar.

First assume that $$f$$ is differentiable at $$x_0$$. We set $$l(h):=f'(x_0)\cdot h$$ and obtain
 * Proof
 * $$\frac{\Big|f(x_0+h)-\big(f(x_0)+l(h)\big)\Big|}{|h|}=\left|\frac{f(x_0+h)-f(x_0)}{h}-f'(x_0)\right|$$

which converges to 0 due to the definition of $$f'(x_0)$$.

Assume now that we are given an $$l:\R\to\R$$ such that
 * $$\lim_{h\to0}\frac{\Big|f(x_0+h)-\big(f(x_0)+l(h)\big)\Big|}{|h|}=0$$

Let $$c$$ be the scalar associated to $$l$$. Then by an analogous computation $$f'(x_0)=c$$.

With the latter formulation of differentiability from the above theorem, we may readily generalize to higher dimensions, since division by the Euclidean norm of a vector is defined, and linear mappings are also defined in higher dimensions.

A function $$f:\R^m\to\R^n$$ is called differentiable or totally differentiable at a point $$x_0\in\R^m$$ if and only if there exists a linear function $$L:\R^m\to\R^n$$ such that
 * Definition
 * $$\lim_{\vec h\to0}\frac{\Big\|f(x_0+\vec h)-\big(f(x_0)+L(\vec h)\big)\Big\|}{\|\vec h\|}=0$$

We have already proven that this definition coincides with the usual one in the one-dim. case (that is $$m=n=1$$).

We have the following theorem:

Let $$S\subseteq\R^m$$ be a set, let $$x_0\in\overset{\circ}{S}$$ be an interior point of $$S$$, and let $$f:S\to\R^m$$ be a function differentiable at $$x_0$$. Then the linear map $$L$$ such that
 * Theorem
 * $$\lim_{\vec h\to0}\frac{\Big\|f(x_0+\vec h)-\big(f(x_0)+L(\vec h)\big)\Big\|}{\|\vec h\|}=0$$

is unique; that is, there exists only one such map $$L$$.

Since $$x_0$$ is an interior point of $$S$$, we find $$r>0$$ such that $$B_r(x_0)\subseteq S$$. Let now $$K:\R^m\to\R^n$$ be any other linear mapping with the property that
 * Proof
 * $$\lim_{\vec h\to0}\frac{\Big\|f(x_0+\vec h)-\big(f(x_0)+K(\vec h)\big)\Big\|}{\|\vec h\|}=0$$

We note that for all vectors of the standard basis $$\{e_1,\ldots,e_n\}$$, the numbers $$\lambda e_j$$ for $$0\le\lambda<r$$ are contained within $$S$$. Hence, we obtain by the triangle inequality
 * $$\Big\|L(\vec{e}_j)-K(\vec{e}_j)\Big\|=\frac{\bigl\|L(\lambda\vec{e}_j)-K(\lambda\vec{e}_j)\bigr\|}{\|\lambda\vec{e}_j\|}\le\frac{\Big\|f(x_0+\lambda\vec{e}_j)-\big(f(x_0)+L(\lambda\vec{e}_j)\big)\Big\|}{\|\lambda\vec{e}_j\|}+\frac{\Big\|f(x_0+\lambda\vec{e}_j)-\big(f(x_0)+K(\lambda\vec{e}_j)\big)\Big\|}{\|\lambda\vec{e}_j\|}$$

Taking $$\lambda\to0$$, we see that $$L(\vec{e}_j)=K(\vec{e}_j)$$. Thus, $$L$$ and $$K$$ coincide on all basis vectors, and since every other vector can be expressed as a linear combination of those, by linearity of $$L$$ and $$K$$ we obtain $$L=K$$.

Thus, the following definition is justified:

Let $$f:S\to\R^n$$ be a function (where $$S\subseteq\R^m$$ is a subset of $$\R^m$$), and let $$x_0$$ be an interior point of $$S$$ such that $$f$$ is differentiable at $$x_0$$. Then the unique linear function $$L$$ such that
 * Definition
 * $$\lim_{\vec h\to0}\frac{\Big\|f(x_0+\vec h)-\big(f(x_0)+L(\vec h)\big)\Big\|}{\|\vec h\|}=0$$

is called the differential of $$f$$ at $$x_0$$ and is denoted $$f(x_0):=L$$.

Directional and partial derivatives
We shall first define directional derivatives.

Let $$f:\R^m\to\R^n$$ be a function, and let $$\vec v\in\R^m$$ be a vector. If the limit
 * Definition
 * $$\lim_{h\to0}\frac{f(x_0+h\vec v)-f(x_0)}{h}$$

exists, it is called directional derivative of $$f$$ in direction $$\vec v$$. We denote it by $$D_{\vec v}f(x_0)$$.

The following theorem relates directional derivatives and the differential of a totally differentiable function:

Let $$f:\R^m\to\R^n$$ be a function that is totally differentiable at $$x_0$$, and let $$\vec v\in\R^m\setminus\{0\}$$ be a nonzero vector. Then $$D_{\vec v}f(x_0)$$ exists and is equal to $$f'(x_0)\vec v$$.
 * Theorem

According to the very definition of total differentiability,
 * Proof
 * $$\lim_{h\to0}\left\|\frac{f(x_0+h\vec v)-f(x_0)}{|h|\cdot\|\vec v\|}-\frac{f'(x_0)\vec v}{|h|\cdot\|\vec v\|}\right\|=0$$

Hence,
 * $$\lim_{h\to0}\left\|\frac{f(x_0+h\vec v)-f(x_0)}{|h|}-\frac{f'(x_0)\vec v}{|h|}\right\|=0$$

by multiplying the above equation by $$\|\vec v\|$$. Noting that
 * $$\left\|\frac{f(x_0+h\vec v)-f(x_0)}{|h|}-\frac{f'(x_0)\vec v}{|h|}\right\|=\left\|\frac{f(x_0+h\vec v)-f(x_0)}{h}-\frac{f'(x_0)\vec v}{h}\right\|$$

the theorem follows.

A special case of directional derivatives are partial derivatives:

Let $$\{\vec{e}_1,\ldots,\vec{e}_m\}$$ be the standard basis of $$\R^m$$, let $$x_0\in\R^m$$ and let $$f:\R^m\to\R^n$$ be a function such that the directional derivatives $$D_{\vec{e}_j}f(x_0)$$ all exist. Then we set
 * Definition
 * $$\frac{\partial f}{\partial x_j}:=D_{\vec{e}_j}f(x_0)$$

and call it the partial derivative in the direction of $$x_j$$.

In fact, by writing down the definition of $$D_{\vec{e}_j} f(x_0)$$, we see that the partial derivative in the direction of $$x_j$$ is nothing else than the derivative of the function $$y\mapsto f(x_{0,1},\ldots,x_{0, j-1},y,x_{0,j+1},\ldots,x_{0,m})$$ in the variable $$y$$ at the place $$x_{0,j}$$. That is, for instance, if
 * $$f(x,y,z)=x^2+4z^3+3xy$$

then
 * $$\frac{\partial f}{\partial x}=2x+3y\ ,\ \frac{\partial f}{\partial y}=3x\ ,\ \frac{\partial f}{\partial z}=12z^2$$

that is, when forming a partial derivative, we regard the other variables as constant and derive only with respect to the variable we are considering.

The Jacobian matrix
From the above, we know that the differential of a function $$f'(x_0)$$ has an associated matrix representing the linear map thus defined. Under a condition, we can determine this matrix from the partial derivatives of the component functions.

Let $$f:\R^m\to\R^n$$ be a function such that all partial derivatives exist at $$x_0$$ and are continuous in each component on $$B_r(x_0)$$ for a possibly very small, but positive $$r>0$$. Then $$f$$ is totally differentiable at $$x_0$$ and the differential of $$f$$ is given by left multiplication by the matrix
 * Theorem
 * $$J_f(x_0):=\begin{pmatrix}

\dfrac{\partial f_1}{\partial x_1}&\cdots&\dfrac{\partial f_1}{\partial x_m}\\ \vdots&\ddots&\vdots \\ \dfrac{\partial f_n}{\partial x_1}&\cdots&\dfrac{\partial f_n}{\partial x_m} \end{pmatrix}$$ where $$f=(f_1,\ldots,f_n)$$.

The matrix $$J_f(x_0)$$ is called the Jacobian matrix.


 * Proof

We shall now prove that all summands of the last sum go to 0.
 * $$\frac{\Big\|f(x_0+\vec h)-(f(x_0)+J_f(x_0)\vec h)\Big\|}{\|\vec h\|}$$
 * $$=\frac{\left\|\displaystyle\sum_{j=1}^n f_j(x_0+\vec h)\vec{e}_j-\sum_{j=1}^n\left(f_j(x_0)+\sum_{k=1}^m h_k\frac{\partial f_j}{\partial x_m}(x_0)\right)\vec{e}_j\right\|}{\|\vec h\|}$$
 * $$\le\sum_{j=1}^n\frac{\left\|f_j(x_0+h)-\left(f_j(x_0)+\displaystyle\sum_{k=1}^m h_k\frac{\partial f_j}{\partial x_m}(x_0)\right)\right\|}{\|\vec h\|}$$
 * }
 * $$\le\sum_{j=1}^n\frac{\left\|f_j(x_0+h)-\left(f_j(x_0)+\displaystyle\sum_{k=1}^m h_k\frac{\partial f_j}{\partial x_m}(x_0)\right)\right\|}{\|\vec h\|}$$
 * }

Indeed, let $$j\in\{1,\ldots,n\}$$. Writing again $$\vec h=(h_1,\ldots,h_m)$$, we obtain by the one-dimensional mean value theorem, first applied in the first variable, then in the second and so on, the succession of equations
 * $$f_j(x_0+h_1\vec{e}_1)-f_j(x_0)=\overbrace{(x_{0,1}+h_1-x_{0,1})}^{=h_1}\frac{\partial f_j}{\partial x_1}(x_0+t_1\vec{e}_1)$$
 * $$f_j(x_0+h_1\vec{e}_1+h_2\vec{e}_2)-f_j(x_0+h_1\vec{e}_1)=\overbrace{(x_{0,2}+h_2-x_{0,2})}^{=h_2}\frac{\partial f_j}{\partial x_2}(x_0+h_1\vec{e}_1+t_2 \vec{e}_2)$$
 * $$\vdots$$
 * $$f_j(x_0+h_1\vec{e}_1+\cdots+h_m\vec{e}_m)-f_j(x_0+h_1\vec{e}_1+\cdots+h_{m-1}\vec{e}_{m-1})=\overbrace{(x_{0,m}+h_m-x_{0,m})}^{=h_m}\frac{\partial f_j}{\partial x_m}(x_0+h_1\vec{e}_1+\cdots+h_{m-1}\vec{e}_{m-1}+t_n\vec{e}_m)$$

for suitably chosen $$t_k\in[x_{0,k},x_{0,k}+h_k]$$. We can now sum all these equations together to obtain
 * $$f_j(x_0+\vec h)-f(x_0)=\sum_{k=1}^m h_k\frac{\partial f_j}{\partial x_k}\left(x_0+\sum_{l=1}^{k-1}h_l\vec{e}_l+t_k\vec{e}_k\right)$$

Let now $$\delta>0$$. Using the continuity of the $$\frac{\partial f_j}{\partial x_k}$$ on $$B_r(x_0)$$, we may choose $$\delta_k>0$$ such that
 * $$\left|\frac{\partial f_j}{\partial x_k}\left(x_0+\sum_{l=1}^{k-1}h_l\vec{e}_l+t_ke_k\right)-\frac{\partial f_j}{\partial x_m}(x_0)\right|<\frac{\epsilon}{m}$$

for $$|h_k|<\delta_k$$, given that $$\vec h\in B_r(0)$$ (which we may assume as $$\vec h\to\vec0$$). Hence, we obtain
 * $$\frac{\left\|f_j(x_0+h)-\left(f_j(x_0)+\displaystyle\sum_{k=1}^m h_k\frac{\partial f_j}{\partial x_m}(x_0)\right)\right\|}{\|\vec h\|}\le\frac{\|\vec h\|\cdot m\cdot\frac{\epsilon}{m}}{\|\vec h\|}$$

and thus the theorem.

If $$f:\R^m\to\R^n$$ is continuously differentiable at $$x_0\in\R^m$$ and $$\vec v\in\R^m\setminus\{0\}$$, then
 * Corollary
 * $$D_{\vec v}f(x_0)=\sum_{j=1}^m v_j\frac{\partial f}{\partial x_j}(x_0)$$


 * Proof
 * $$D_{\vec v}f(x_0)=f'(x_0)(\vec v)=J_f(x_0)\vec v=\sum_{j=1}^m v_j\frac{\partial f}{\partial x_j}(x_0)$$