File: automatic_derivatives.rst

package info (click to toggle)
ceres-solver 2.1.0%2Breally2.1.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 13,656 kB
  • sloc: cpp: 80,895; ansic: 2,869; python: 679; sh: 78; makefile: 74; xml: 21
file content (306 lines) | stat: -rw-r--r-- 9,851 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
.. default-domain:: cpp

.. cpp:namespace:: ceres

.. _chapter-automatic_derivatives:

=====================
Automatic Derivatives
=====================

We will now consider automatic differentiation. It is a technique that
can compute exact derivatives, fast, while requiring about the same
effort from the user as is needed to use numerical differentiation.

Don't believe me? Well here goes. The following code fragment
implements an automatically differentiated ``CostFunction`` for `Rat43
<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.

.. code-block:: c++

  struct Rat43CostFunctor {
    Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}

    template <typename T>
    bool operator()(const T* parameters, T* residuals) const {
      const T b1 = parameters[0];
      const T b2 = parameters[1];
      const T b3 = parameters[2];
      const T b4 = parameters[3];
      residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
      return true;
    }

    private:
      const double x_;
      const double y_;
  };


  CostFunction* cost_function =
        new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
          new Rat43CostFunctor(x, y));

Notice that compared to numeric differentiation, the only difference
when defining the functor for use with automatic differentiation is
the signature of the ``operator()``.

In the case of numeric differentiation it was

.. code-block:: c++

   bool operator()(const double* parameters, double* residuals) const;

and for automatic differentiation it is a templated function of the
form

.. code-block:: c++

   template <typename T> bool operator()(const T* parameters, T* residuals) const;


So what does this small change buy us? The following table compares
the time it takes to evaluate the residual and the Jacobian for
`Rat43` using various methods.

==========================   =========
CostFunction                 Time (ns)
==========================   =========
Rat43Analytic                      255
Rat43AnalyticOptimized              92
Rat43NumericDiffForward            262
Rat43NumericDiffCentral            517
Rat43NumericDiffRidders           3760
Rat43AutomaticDiff                 129
==========================   =========

We can get exact derivatives using automatic differentiation
(``Rat43AutomaticDiff``) with about the same effort that is required
to write the code for numeric differentiation but only :math:`40\%`
slower than hand optimized analytical derivatives.

So how does it work? For this we will have to learn about **Dual
Numbers** and **Jets** .


Dual Numbers & Jets
===================

.. NOTE::

   Reading this and the next section on implementing Jets is not
   necessary to use automatic differentiation in Ceres Solver. But
   knowing the basics of how Jets work is useful when debugging and
   reasoning about the performance of automatic differentiation.

Dual numbers are an extension of the real numbers analogous to complex
numbers: whereas complex numbers augment the reals by introducing an
imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
:math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
components, the *real* component :math:`a` and the *infinitesimal*
component :math:`v`.

Surprisingly, this simple change leads to a convenient method for
computing exact derivatives without needing to manipulate complicated
symbolic expressions.

For example, consider the function

.. math::

   f(x) = x^2 ,

Then,

.. math::

   \begin{align}
   f(10 + \epsilon) &= (10 + \epsilon)^2\\
            &= 100 + 20 \epsilon + \epsilon^2\\
            &= 100 + 20 \epsilon
   \end{align}

Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
20`. Indeed this generalizes to functions which are not
polynomial. Consider an arbitrary differentiable function
:math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
considering the Taylor expansion of :math:`f` near :math:`x`, which
gives us the infinite series

.. math::
   \begin{align}
   f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
   \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
   f(x + \epsilon) &= f(x) + Df(x) \epsilon
   \end{align}

Here we are using the fact that :math:`\epsilon^2 = 0`.

A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
:math:`n`-dimensional dual number, where we augment the real numbers
with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
*infinitesimal* part :math:`\mathbf{v}`, i.e.,

.. math::
   x = a + \sum_j v_{j} \epsilon_j

The summation notation gets tedious, so we will also just write

.. math::
   x = a + \mathbf{v}.

where the :math:`\epsilon_i`'s are implicit. Then, using the same
Taylor series expansion used above, we can see that:

.. math::

  f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.

Similarly for a multivariate function
:math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
:math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:

.. math::
   f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i

So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
standard basis vector, then, the above expression would simplify to

.. math::
   f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i

and we can extract the coordinates of the Jacobian by inspecting the
coefficients of :math:`\epsilon_i`.

Implementing Jets
-----------------

In order for the above to work in practice, we will need the ability
to evaluate an arbitrary function :math:`f` not just on real numbers
but also on dual numbers, but one does not usually evaluate functions
by evaluating their Taylor expansions,

This is where C++ templates and operator overloading comes into
play. The following code fragment has a simple implementation of a
``Jet`` and some operators/functions that operate on them.

.. code-block:: c++

   template<int N> struct Jet {
     double a;
     Eigen::Matrix<double, 1, N> v;
   };

   template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
     return Jet<N>(f.a + g.a, f.v + g.v);
   }

   template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
     return Jet<N>(f.a - g.a, f.v - g.v);
   }

   template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
     return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
   }

   template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
     return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
   }

   template <int N> Jet<N> exp(const Jet<N>& f) {
     return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
   }

   // This is a simple implementation for illustration purposes, the
   // actual implementation of pow requires careful handling of a number
   // of corner cases.
   template <int N>  Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
     return Jet<N>(pow(f.a, g.a),
                   g.a * pow(f.a, g.a - 1.0) * f.v +
                   pow(f.a, g.a) * log(f.a); * g.v);
   }


With these overloaded functions in hand, we can now call
``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
that together with appropriately initialized Jets allows us to compute
the Jacobian as follows:

.. code-block:: c++

  class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
   public:
    Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
    virtual ~Rat43Automatic() {}
    virtual bool Evaluate(double const* const* parameters,
                          double* residuals,
                          double** jacobians) const {
      // Just evaluate the residuals if Jacobians are not required.
      if (!jacobians) return (*functor_)(parameters[0], residuals);

      // Initialize the Jets
      ceres::Jet<4> jets[4];
      for (int i = 0; i < 4; ++i) {
        jets[i].a = parameters[0][i];
        jets[i].v.setZero();
        jets[i].v[i] = 1.0;
      }

      ceres::Jet<4> result;
      (*functor_)(jets, &result);

      // Copy the values out of the Jet.
      residuals[0] = result.a;
      for (int i = 0; i < 4; ++i) {
        jacobians[0][i] = result.v[i];
      }
      return true;
    }

   private:
    std::unique_ptr<const Rat43CostFunctor> functor_;
  };

Indeed, this is essentially how :class:`AutoDiffCostFunction` works.

Pitfalls
========

Automatic differentiation frees the user from the burden of computing
and reasoning about the symbolic expressions for the Jacobians, but
this freedom comes at a cost. For example consider the following
simple functor:

.. code-block:: c++

   struct Functor {
     template <typename T> bool operator()(const T* x, T* residual) const {
       residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
       return true;
     }
   };

Looking at the code for the residual computation, one does not foresee
any problems. However, if we look at the analytical expressions for
the Jacobian:

.. math::

      y &= 1 - \sqrt{x_0^2 + x_1^2}\\
   D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
   D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}

we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
0`.

There is no single solution to this problem. In some cases one needs
to reason explicitly about the points where indeterminacy may occur
and use alternate expressions using `L'Hopital's rule
<https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
example some of the conversion routines in `rotation.h
<https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
other cases, one may need to regularize the expressions to eliminate
these points.