Calculus/Inverse function theorem, implicit function theorem

In this chapter, we want to prove the inverse function theorem (which asserts that if a function has invertible differential at a point, then it is locally invertible itself) and the implicit function theorem (which asserts that certain sets are the graphs of functions).

Banach's fixed point theorem
Theorem:

Let $$(M, d)$$ be a complete metric space, and let $$f: M \to M$$ be a strict contraction; that is, there exists a constant $$0 \le \lambda < 1$$ such that
 * $$\forall m, n \in M: d(f(m), f(n)) \le \lambda d(m, n)$$.

Then $$f$$ has a unique fixed point, which means that there is a unique $$x \in M$$ such that $$f(x) = x$$. Furthermore, if we start with a completely arbitrary point $$y \in M$$, then the sequence
 * $$y, f(y), f(f(y)), f(f(f(y))), \ldots$$

converges to $$x$$.

Proof:

First, we prove uniqueness of the fixed point. Assume $$x, y$$ are both fixed points. Then
 * $$d(x, y) = d(f(x), f(y)) \le \lambda d(x, y) \Rightarrow (1 - \lambda) d(x, y) = 0$$.

Since $$0 \le \lambda < 1$$, this implies $$d(x, y) = 0 \Rightarrow x = y$$.

Now we prove existence and simultaneously the claim about the convergence of the sequence $$y, f(y), f(f(y)), f(f(f(y))), \ldots$$. For notation, we thus set $$z_0 := y$$ and if $$z_n$$ is already defined, we set $$z_{n+1} = f(z_n)$$. Then the sequence $$(z_n)_{n \in \mathbb N}$$ is nothing else but the sequence $$y, f(y), f(f(y)), f(f(f(y))), \ldots$$.

Let $$n \ge 0$$. We claim that
 * $$d(z_{n+1}, z_n) \le \lambda^n d(z_1, z_0)$$.

Indeed, this follows by induction on $$n$$. The case $$n=0$$ is trivial, and if the claim is true for $$n$$, then $$d(z_{n+2}, z_{n+1}) = d(f(z_{n+1}), f(z_n)) \le \lambda d(z_{n+1}, z_n) \le \lambda \cdot \lambda^n d(z_1, z_0)$$.

Hence, by the triangle inequality,
 * $$\begin{align}

d(z_{n + m}, z_n) & \le \sum_{j=n+1}^{n+m} d(z_j, z_{j-1}) \\ & \le \sum_{j=n+1}^{n+m} \lambda^{j-1} d(z_1, z_0) \\ & \le \sum_{j=n+1}^\infty \lambda^{j-1} d(z_1, z_0) \\ & = d(z_1, z_0) \lambda^n \frac{1}{1 - \lambda} \end{align}$$. The latter expression goes to zero as $$n \to \infty$$ and hence we are dealing with a Cauchy sequence. As we are in a complete metric space, it converges to a limit $$x$$. This limit further is a fixed point, as the continuity of $$f$$ ($$f$$ is Lipschitz continuous with constant $$\lambda$$) implies
 * $$x = \lim_{n \to \infty} z_n = \lim_{n \to \infty} f(z_{n-1}) = f(\lim_{n \to \infty} z_{n-1}) = f(x)$$.

A corollary to this important result is the following lemma, which shall be the main ingredient for the proof of the inverse function theorem:

Lemma:

Let $$g: \overline{B_r(0)} \to \overline{B_r(0)}$$ ($$\overline{B_r(0)} \subset \mathbb R^n$$ denoting the closed ball of radius $$r$$) be a function which is Lipschitz continuous with Lipschitz constant less or equal $$1/2$$ such that $$g(0) = 0$$. Then the function
 * $$f: \overline{B_r(0)} \to \mathbb R^n, f(x) := g(x) + x$$

is injective and $$B_{r/2}(0) \subseteq f(B_r(0))$$.

Proof:

First, we note that for $$y \in B_{r/2}(0)$$ the function
 * $$h: \overline{B_r(0)} \to \mathbb R^n, h(z) := y - g(z)$$

is a strict contraction; this is due to
 * $$\|y - g(z) - (y - g(z'))\| = \|g(z') - g(z)\| \le \frac{1}{2}\|z - z'\|$$.

Furthermore, it maps $$\overline{B_r(0)}$$ to itself, since for $$z \in \overline{B_r(0)}$$
 * $$\|y - g(z)\| \le \|y\| + \|g(z - 0)\| \le \frac{r}{2} + \frac{1}{2}\|z\| \le r$$.

Hence, the Banach fixed-point theorem is applicable to $$h$$. Now $$x$$ being a fixed point of $$h$$ is equivalent to
 * $$f(x) = y$$,

and thus $$B_{r/2}(0) \subseteq f(B_r(0))$$ follows from the existence of fixed points. Furthermore, if $$f(x) = f(x')$$, then
 * $$\frac{1}{2} \|x - x'\| \ge \|g(x) - g(x')\| = \|f(x) - x - (f(x') - x')\| = \|x - x'\|$$

and hence $$x = x'$$. Thus injectivity.

The inverse function theorem
Theorem:

Let $$f: \mathbb R^n \to \mathbb R^n$$ be a function which is continuously differentiable in a neighbourhood $$x_0 \in \mathbb R^n$$ such that $$f'(x_0)$$ is invertible. Then there exists an open set $$U \subseteq \mathbb R^n$$ with $$x_0 \in U$$ such that $$f|_U$$ is a bijective function with an inverse $$f^{-1} : f(U) \to U$$ which is differentiable at $$x_0$$ and satisfies
 * $$(f^{-1})'(f(x_0)) = (f'(x_0))^{-1}$$.

Proof:

We first reduce to the case $$f(x_0) = 0$$, $$x_0 = 0$$ and $$f'(x_0) = \text{Id}$$. Indeed, suppose for all those functions the theorem holds, and let now $$h$$ be an arbitrary function satisfying the requirements of the theorem (where the differentiability is given at $$x_0$$). We set
 * $$\tilde h(x) := h'(x_0)^{-1}(h(x_0 - x) - h(x_0))$$

and obtain that $$\tilde h$$ is differentiable at $$0$$ with differential $$\text{Id}$$ and $$\tilde h(0) = 0$$; the first property follows since we multiply both the function and the linear-affine approximation by $$h'(x_0)^{-1}$$ and only shift the function, and the second one is seen from inserting $$x = 0$$. Hence, we obtain an inverse of $$\tilde h$$ with it's differential at $$\tilde h(0) = 0$$, and if we now set
 * $$h^{-1}(y) := (\tilde h^{-1} (h'(x_0)^{-1}(y - h(x_0))) - x_0)$$,

it can be seen that $$h^{-1}$$ is an inverse of $$h$$ with all the required properties (which is a bit of a tedious exercise, but involves nothing more than the definitions).

Thus let $$f$$ be a function such that $$f(0) = 0$$, $$f$$ is invertible at $$0$$ and $$f'(0) = \text{Id}$$. We define
 * $$g(x) := f(x) - x$$.

The differential of this function is zero (since taking the differential is linear and the differential of the function $$x \mapsto x$$ is the identity). Since the function $$g$$ is also continuously differentiable at a small neighbourhood of $$0$$, we find $$\delta > 0$$ such that
 * $$\frac{\partial g}{\partial x_j}(x) < \frac{1}{2n^2}$$

for all $$j \in \{1, \ldots, n\}$$ and $$x \in B_\delta(0)$$. Since further $$g(0) = f(0) - 0 = 0$$, the general mean-value theorem and Cauchy's inequality imply that for $$k \in \{1, \ldots, n\}$$ and $$x \in B_\delta(0)$$,
 * $$|g_k(x)| = |\langle x, \frac{\partial g}{\partial x_j}(t_k x) \rangle| \le \|x\| n \frac{1}{2n^2}$$

for suitable $$t_k \in [0, 1]$$. Hence,
 * $$\|g(x)\| \le |g_1(x)| + \cdots + |g_n(x)| \le \frac{1}{2} \|x\|$$ (triangle inequality),

and thus, we obtain that our preparatory lemma is applicable, and $$f$$ is a bijection on $$\overline{B_\delta(0)}$$, whose image is contained within the open set $$\overline{B_{\delta/2}(0)}$$; thus we may pick $$U := f^{-1}(B_{\delta/2}(0))$$, which is open due to the continuity of $$f$$.

Thus, the most important part of the theorem is already done. All that is left to do is to prove differentiability of $$f^{-1}$$ at $$0$$. Now we even prove the slightly stronger claim that the differential of $$f^{-1}$$ at $$x_0$$ is given by the identity, although this would also follow from the chain rule once differentiability is proven.

Note now that the contraction identity for $$g$$ implies the following bounds on $$f$$:
 * $$\frac{1}{2}\|x\| \le \|f(x)\| \le \frac{3}{2} \|x\|$$.

The second bound follows from
 * $$\|f(x)\| \le \|f(x) - x\| + \|x\| = \|g(x)\| + \|x\| \le \frac{3}{2} \|x\|$$,

and the first bound follows from
 * $$\|f(x)\| \ge |\|f(x) - x\| - \|x\|| = \left| \|g(x)\| - \|x\| \right| \ge \frac{1}{2} \|x\|$$.

Now for the differentiability at $$0$$. We have, by substitution of limits (as $$f$$ is continuous and $$f(0) = 0$$):
 * $$\begin{align}

\lim_{\mathbf h \to 0} \frac{\|f^{-1}(\mathbf h) - f^{-1}(0) - \operatorname{Id} (\mathbf h - 0)\|}{\|\mathbf h\|} & = \lim_{\mathbf h \to 0} \frac{\|f^{-1}(f(\mathbf h)) - f(\mathbf h)\|}{\|f(\mathbf h)\|} \\ & = \lim_{\mathbf h \to 0} \frac{\|\mathbf h - f(\mathbf h)\|}{\|f(\mathbf h)\|}, \end{align}$$ where the last expression converges to zero due to the differentiability of $$f$$ at $$0$$ with differential the identity, and the sandwhich criterion applied to the expressions
 * $$\frac{\|\mathbf h - f(\mathbf h)\|}{\frac{3}{2} \|\mathbf h\|}$$

and
 * $$\frac{\|\mathbf h - f(\mathbf h)\|}{\frac{1}{2} \|\mathbf h\|}$$.

The implicit function theorem
Theorem:

Let $$f: \mathbb R^n \to \mathbb R$$ be a continuously differentiable function, and consider the set
 * $$S := \{(x_1, \ldots, x_n) \in \mathbb R^n | f(x_1, \ldots, x_n) = 0\}$$.

If we are given some $$y \in S$$ such that $$\partial_n f(y) \neq 0$$, then we find $$U \subseteq \mathbb R^{n-1}$$ open with $$(y_1, \ldots, y_{n-1}) \in U$$ and $$g: U \to S$$ such that
 * $$y = g(y_1, \ldots, y_{n-1})$$ and $$\{(z_1, \ldots, z_{n-1}, g(z_1, \ldots, z_{n-1})) | (z_1, \ldots, z_{n-1}) \in U\} \subseteq S$$,

where $$\{(z_1, \ldots, z_{n-1}, g(z_1, \ldots, z_{n-1})) | (z_1, \ldots, z_{n-1}) \in U\}$$ is open with respect to the subspace topology of $$U$$.

Furthermore, $$g$$ is a differentiable function.

Proof:

We define a new function
 * $$F: \mathbb R^n \to \mathbb R^n, F(x_1, \ldots, x_n) := (x_1, \ldots, x_{n-1}, f(x_1, \ldots, x_n))$$.

The differential of this function looks like this:
 * $$F'(x) = \begin{pmatrix}

1 & 0 & \cdots & & 0 \\ 0 & 1 & & & \vdots \\ \vdots & & \ddots & & \\ 0 & \cdots & 0 & 1 & 0 \\ \partial_1 f(x) & & \cdots & & \partial_n f(x) \end{pmatrix}$$ Since we assumed that $$\partial_n f(y) \neq 0$$, $$F'(y)$$ is invertible, and hence the inverse function theorem implies the existence of a small open neighbourhood $$\tilde V \subseteq \mathbb R^n$$ containing $$y$$ such that restricted to that neighbourhood $$F$$ is itself invertible, with a differentiable inverse $$F^{-1}$$, which is itself defined on an open set $$\tilde U$$ containing $$F(y)$$. Now set first
 * $$U := \{(x_1, \ldots, x_{n-1}) | (x_1, \ldots, x_{n-1}, 0) \in \tilde U \}$$,

which is open with respect to the subspace topology of $$\mathbb R^{n-1}$$, and then
 * $$g: U \to \mathbb R, g(x_1, \ldots, x_{n-1}) := \pi_n(F^{-1}(x_1, \ldots, x_{n-1}, 0))$$,

the $$n$$-th component of $$F^{-1}(x_1, \ldots, x_{n-1}, 0)$$. We claim that $$g$$ has the desired properties.

Indeed, we first note that $$F^{-1}(x_1, \ldots, x_{n-1}, 0) = (x_1, \ldots, x_{n-1}, g(x_1, \ldots, x_{n-1}))$$, since applying $$F$$ leaves the first $$n-1$$ components unchanged, and thus we get the identity by observing $$F(F^{-1}(x)) = x$$. Let thus $$(z_1, \ldots, z_{n-1}) \in U$$. Then
 * $$\begin{align}

f(z_1, \ldots, z_{n-1}, g(z_1, \ldots, z_{n-1})) & = (\pi_n \circ F)(F^{-1}(z_1, \ldots, z_{n-1}, 0)) \\ & = \pi_n((F \circ F^{-1})(z_1, \ldots, z_{n-1}, 0)) = 0 \end{align}$$. Furthermore, the set
 * $$\{(z_1, \ldots, z_{n-1}, g(z_1, \ldots, z_{n-1})) | (z_1, \ldots, z_{n-1}) \in U\}$$

is open with respect to the subspace topology on $$S$$. Indeed, we show
 * $$\{(z_1, \ldots, z_{n-1}, g(z_1, \ldots, z_{n-1})) | (z_1, \ldots, z_{n-1}) \in U\} = S \cap \tilde V$$.

For $$\subseteq$$, we first note that the set on the left hand side is in $$S$$, since all points in it are mapped to zero by $$f$$. Further,
 * $$F(z_1, \ldots, z_{n-1}, g(z_1, \ldots, z_{n-1})) = (z_1, \ldots, z_{n-1}, 0) \in \tilde U$$

and hence $$\subseteq$$ is completed when applying $$F^{-1}$$. For the other direction, let a point $$(x_1, \ldots, x_n)$$ in $$S \cap \tilde V$$ be given, apply $$F$$ to get
 * $$F((x_1, \ldots, x_n)) = (x_1, \ldots, x_{n-1}, 0) \in \tilde U$$

and hence $$(x_1, \ldots, x_{n-1}) \in U$$; further
 * $$(x_1, \ldots, x_{n-1}, g(x_1, \ldots, x_{n-1})) = (x_1, \ldots, x_n)$$

by applying $$F$$ to both sides of the equation.

Now $$g$$ is automatically differentiable as the component of a differentiable function.

Informally, the above theorem states that given a set $$\{x \in \mathbb R^n | f(x) = 0\}$$, one can choose the first $$n-1$$ coordinates as a "base" for a function, whose graph is precisely a local bit of that set.