File: inverse_and_implicit_function_theorems.rst

package info (click to toggle)
ceres-solver 2.1.0%2Breally2.1.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 13,656 kB
  • sloc: cpp: 80,895; ansic: 2,869; python: 679; sh: 78; makefile: 74; xml: 21
file content (214 lines) | stat: -rw-r--r-- 8,383 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
.. default-domain:: cpp

.. cpp:namespace:: ceres

.. _chapter-inverse_function_theorem:

==========================================
Using Inverse & Implicit Function Theorems
==========================================

Until now we have considered methods for computing derivatives that
work directly on the function being differentiated. However, this is
not always possible. For example, if the function can only be computed
via an iterative algorithm, or there is no explicit definition of the
function available.  In this section we will see how we can use two
basic results from calculus to get around these difficulties.


Inverse Function Theorem
========================

Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
but evaluating :math:`f(x)` is not easy. Say it involves running an
iterative algorithm. You could try automatically differentiating the
iterative algorithm, but even if that is possible, it can become quite
expensive.

In some cases we get lucky, and computing the inverse of :math:`f(x)`
is an easy operation. In these cases, we can use the `Inverse Function
Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
compute the derivative exactly. Here is the key idea:

Assuming that :math:`y=f(x)` is continuously differentiable in a
neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
the identity :math:`f^{-1}(f(x)) = x`, we have
:math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
:math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.

For example, let :math:`f(x) = e^x`. Now of course we know that
:math:`Df(x) = e^x`, but let's try and compute it via the Inverse
Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
(Df^{-1}(y))^{-1} = y = e^x`.

You maybe wondering why the above is true. A smoothly differentiable
function in a small neighborhood is well approximated by a linear
function. Indeed this is a good way to think about the Jacobian, it is
the matrix that best approximates the function linearly. Once you do
that, it is straightforward to see that *locally* :math:`f^{-1}(y)` is
best approximated linearly by the inverse of the Jacobian of
:math:`f(x)`.

Let us now consider a more practical example.

Geodetic Coordinate System Conversion
-------------------------------------

When working with data related to the Earth, one can use two different
coordinate systems. The familiar (latitude, longitude, height)
Latitude-Longitude-Altitude coordinate system or the `ECEF
<http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
is familiar but is not terribly convenient analytically. The latter is
a Cartesian system but not particularly intuitive. So systems that
process earth related data have to go back and forth between these
coordinate systems.

The conversion between the LLA and the ECEF coordinate system requires
a model of the Earth, the most commonly used one being `WGS84
<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.

Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
:math:`(x,y,z)` coordinates is easy.

.. math::

   \chi &= \sqrt{1 - e^2 \sin^2 \phi}

   X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda

   Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda

   Z &= \left(\frac{a(1-e^2)}{\chi}  +h \right) \sin \phi

Here :math:`a` and :math:`e^2` are constants defined by `WGS84
<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.

Going from ECEF to LLA coordinates requires an iterative algorithm. So
to compute the derivative of the this transformation we invoke the
Inverse Function Theorem as follows:

.. code-block:: c++

   Eigen::Vector3d ecef; // Fill some values
   // Iterative computation.
   Eigen::Vector3d lla = ECEFToLLA(ecef);
   // Analytic derivatives
   Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
   bool invertible;
   Eigen::Matrix3d ecef_to_lla_jacobian;
   lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);


Implicit Function Theorem
=========================

Consider now the problem where we have two variables :math:`x \in
\mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
:math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
:math:`y` with respect to `x`. How do we do this?

If for a given value of :math:`(x,y)`, the partial Jacobian
:math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
<https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
that there exists a neighborhood of :math:`x` and a function :math:`G`
such :math:`y = G(x)` in this neighborhood. Differentiating
:math:`F(x,G(x)) = 0` gives us

.. math::

   D_1F(x,y) + D_2F(x,y)DG(x) &= 0

                        DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)

                        D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)

This means that we can compute the derivative of :math:`y` with
respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
:math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.

Let's consider two examples.

Roots of a Polynomial
---------------------

The first example we consider is a classic. Let :math:`p(x) = a_0 +
a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
to compute the derivative of its roots with respect to its
coefficients. There is no closed form formula for computing the roots
of a general degree :math:`n` polynomial. `Galois
<https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
<https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
are numerical algorithms like computing the eigenvalues of the
`Companion Matrix
<https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
differentiating an eigenvalue solver does not seem like fun. But the
Implicit Function Theorem offers us a simple path.

If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
a_0 + a_1 x + \dots + a_n x^n = 0`. So,

.. math::

   D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]

   D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)

        Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]

Differentiating the Solution to an Optimization Problem
-------------------------------------------------------

Sometimes we are required to solve optimization problems inside
optimization problems, and this requires computing the derivative of
the optimal solution (or a fixed point) of an optimization problem
w.r.t its parameters.

Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
\mathbb{R}^{k\times n}` be a matrix whose entries are a function of
:math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
be a constant vector, then consider the linear least squares problem:

.. math::

   x^* = \arg \min_x \|A(\theta) x - b\|_2^2

How do we compute :math:`D_\theta x^*(\theta)`?

One approach would be to observe that :math:`x^*(\theta) =
(A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
this w.r.t :math:`\theta`. But this would require differentiating
through the inverse of the matrix
:math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
the Implicit Function Theorem instead.

The first step is to observe that :math:`x^*` satisfies the so called
*normal equations*.

.. math::

   A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0

We will compute :math:`D_\theta x^*` column-wise, treating
:math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
of :math:`\theta` at a time. So using the normal equations, let's
define :math:`F(\theta_i, x^*) = A^\top(\theta_i)A(\theta_i)x^* -
A^\top(\theta_i)b = 0`. Using which can now compute:

.. math::

   D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
   D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i

   D_2F(\theta_i, x^*) &= A^\top A

   Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i

   Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]

Observe that we only need to compute the inverse of :math:`A^\top A`,
to compute :math:`D x^*(\theta)`, which we needed anyways to compute
:math:`x^*`.