File: conventions.org

package info (click to toggle)
mrcal 2.5.2-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 8,548 kB
  • sloc: python: 40,828; ansic: 15,809; cpp: 1,754; perl: 303; makefile: 163; sh: 99; lisp: 84
file content (235 lines) | stat: -rw-r--r-- 11,057 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
#+TITLE: mrcal conventions
#+OPTIONS: toc:t
* Terminology
:PROPERTIES:
:CUSTOM_ID: terminology
:END:

Some terms in the documentation and sources can have ambiguous meanings, so I
explicitly define them here

- *calibration*: the procedure used to compute the lens parameters and geometry
  of a set of cameras; or the result of this procedure. Usually this involves a
  stationary set of cameras observing a moving object.

- [[file:formulation.org::#calibration-object][*calibration object*]] or *chessboard* or *board*: these are used more or less
  interchangeably. They refer to the known-geometry object observed by the
  cameras, with those observations used as input during calibration

- *camera model*: this is used to refer to the intrinsics (lens parameters) and
  extrinsics (geometry) together

- *confidence*: a measure of dispersion of some estimator. Higher confidence
  implies less dispersion. Used to describe the [[file:uncertainty.org][calibration quality]]. Inverse of
  *uncertainty*

- *extrinsics*: the pose of a camera in respect to some fixed coordinate system

- *frames*: in the context of [[file:formulation.org][mrcal's optimization]] these refer to an array of
  poses of the observed chessboards

- *intrinsics*: the parameters describing the behavior of a lens. The pose of
  the lens is independent of the intrinsics

- *measurements* or *residuals*: used more or less interchangeably. This is the
  vector whose norm the [[file:formulation.org][optimization algorithm]] is trying to minimize. mrcal
  refers to this as $\vec x$, and it primarily contains differences between
  observed and predicted pixel observations

- *projection*: a mapping of a point in space in the camera coordinate system
  to a pixel coordinate where that point would be observed by a given camera

- *pose*: a 6 degree-of-freedom vector describing a position and an orientation

- *SFM*: structure from motion. This is the converse of "calibration": we
  observe a stationary scene from a moving camera to compute the geometry of the
  scene

- *state*: the vector of parameters that the [[file:formulation.org][mrcal optimization algorithm]] moves
  around as it searches for the optimum. mrcal refers to this as $\vec b$

- *uncertainty*: a measure of dispersion of some estimator. Higher uncertainty
  implies more dispersion. Used to describe the [[file:uncertainty.org][calibration quality]]. Inverse of
  *confidence*

- *unprojection*: a mapping of a pixel coordinate back to a point in space in
  the camera coordinate system that would produce an observation at that pixel.
  Unprojection is only unique up-to scale

* Symbols
** Geometry
- $\vec q$ is a 2-dimensional vector representing a pixel coordinate: $\left( x,y \right)$

- $\vec v$ is a 3-dimensional vector representing a /direction/ $\left( x,y,z
  \right)$ in space. $\vec v$ is unique only up-to-length. In a camera's
  coordinate system we have $\vec q = \mathrm{project}\left(\vec v \right)$

- $\vec p$ is a 3-dimensional vector representing a /point/ $\left( x,y,z
  \right)$ in space. Unlike $\vec v$, $\vec p$ has a defined range. Like $\vec
  v$ we have $\vec q = \mathrm{project}\left(\vec p \right)$

- $\vec u$ is a 2-dimensional vector representing a [[file:lensmodels.org::#lensmodel-stereographic][normalized stereographic projection]]

** Optimization
:PROPERTIES:
:CUSTOM_ID: symbols-optimization
:END:

The core of the [[file:formulation.org][mrcal calibration routine]] is a nonlinear least-squares
optimization

\[
\min_{\vec b} E = \min_{\vec b} \left \Vert \vec x \left( \vec b \right) \right \Vert ^2
\]

Here we have

- $\vec b$ is the vector of parameters being optimized. Earlier versions of
  mrcal used $\vec p$ for this, but it clashed with $\vec p$ referring to points
  in space, which wasn't always clear from context. Some older code or
  documentation may still use $\vec p$ to refer to optimization state

- $\vec x$ is the vector of /measurements/ describing the error of the solution
  at some hypothesis $\vec b$

- $E$ is the cost function being optimized. $E \equiv \left \Vert \vec x \right \Vert ^2$

- $\vec J$ is the /jacobian/ matrix. This is the matrix $\frac{ \partial \vec x
  }{ \partial \vec b }$. Usually this is large and sparse.

* Image coordinate system
:PROPERTIES:
:CUSTOM_ID: image-coordinate-system
:END:

As with most image-oriented tools, mrcal places the image (0,0) at the *center
of the top-left pixel*. Thus the top-left corner of the image is at (-0.5,-0.5).
Other possible choices /not/ used by mrcal are

- what e.g. [[https://boofcv.org][BoofCV]] does: (0,0) is at the top-left corner of the top-left pixel
- what e.g. [[https://en.wikipedia.org/wiki/OpenGL][OpenGL]] does: (0,0) is at the bottom-left corner of the bottom-left pixel

* Camera coordinate system
:PROPERTIES:
:CUSTOM_ID: camera-coordinate-system
:END:

mrcal uses right-handed coordinate systems. No convention is assumed for the
world coordinate system. The canonical /camera/ coordinate system has $x$ and
$y$ as with pixel coordinates in an image: $x$ is to the right and $y$ is down.
$z$ is then forward to complete the right-handed system of coordinates.

* Transformation naming
:PROPERTIES:
:CUSTOM_ID: transformation-naming
:END:

We describe transformations as mappings between a representation of a point in
one coordinate system to a representation of the /same/ point in another
coordinate system. =T_AB= is a transformation from coordinate system =B= to
coordinate system =A=. These chain together nicely, so if we know the
transformation between =A= and =B= and between =B= and =C=, we can transform a
point represented in =C= to =A=: =x_A = T_AB T_BC x_C = T_AC x_C=. And =T_AC =
T_AB T_BC=.

* Pose representation
:PROPERTIES:
:CUSTOM_ID: pose-representation
:END:

Various parts of the toolkit have preferred representations of pose, and mrcal
has functions to convert between them. Available representations are:

- =Rt=: a (4,3) numpy array with a (3,3) rotation matrix concatenated with a
  (1,3) translation vector:

  \[ \begin{bmatrix} R \\ \vec t^T \end{bmatrix} \]

  This form is easy to work with, but there are implied constraints: most (4,3)
  numpy arrays are /not/ valid =Rt= transformations.

- =rt=: a (6,) numpy array with a (3,) vector representing a [[https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation#Rotation_vector][Rodrigues rotation]]
  concatenated with another (3,) vector, representing a translation:

  \[ \left[ \vec r^T \quad \vec t^T \right] \]

  This form requires more computations to deal with, but has no implied
  constraints: /any/ (6,) numpy array is a valid =rt= transformation. Thus this
  is the form used inside the [[file:formulation.org][mrcal optimization routine]].

- =qt=: a (7,) numpy array with a (4,) vector representing a unit quaternion
  $\left(w,x,y,z\right)$ concatenated with another (3,) vector, representing a
  translation:

  \[ \left[ \vec q^T \quad \vec t^T \right] \]

  mrcal doesn't use quaternions anywhere, and this exists only for
  interoperability with other tools.

Each of these represents a transformation =rotate(x) + t=.

Since a pose represents a transformation between two coordinate systems, the
toolkit generally refers to a pose as something like =Rt_AB=, which is an
=Rt=-represented transformation to convert a point to a representation in the
coordinate system =A= from a representation in coordinate system =B=.

A Rodrigues rotation vector =r= represents a rotation of =length(r)= radians
around an axis in the direction =r=. Converting between =R= and =r= is done via
the [[https://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula][Rodrigues rotation formula]]: using the [[file:mrcal-python-api-reference.html#-r_from_R][=mrcal.r_from_R()=]] and
[[file:mrcal-python-api-reference.html#-R_from_r][=mrcal.R_from_r()=]] functions. For translating /poses/, not just rotations, use
[[file:mrcal-python-api-reference.html#-rt_from_Rt][=mrcal.rt_from_Rt()=]] and [[file:mrcal-python-api-reference.html#-Rt_from_rt][=mrcal.Rt_from_rt()=]].

* Linear algebra
:PROPERTIES:
:CUSTOM_ID: linear-algebra
:END:

mrcal follows the usual linear algebra convention of /column/ vectors. So
rotating a vector using a rotation matrix is a matrix-vector multiplication
operation: $\vec b = R \vec a$ where both $\vec a$ and $\vec b$ are column
vectors.

However, numpy prints vectors (1-dimensional objects), as /row/ vectors, so the
code treats 1-dimensional objects as transposed vectors. In the code, the above
rotation would be implemented equivalently: $\vec b^T = \vec a^T R^T$. The
[[file:mrcal-python-api-reference.html#-rotate_point_R][=mrcal.rotate_point_R()=]] and [[file:mrcal-python-api-reference.html#-transform_point_Rt][=mrcal.transform_point_Rt()=]] functions handle this
transparently.

A similar issue is that numpy follows the linear algebra convention of indexing
arrays with =(index_row, index_column)= and representing array sizes with
=(height, width)=. This runs against the /other/ convention of referring to
pixel coordinates as =(x,y)= and image dimensions as =(width, height)=. Whenever
possible, mrcal places the =x= coordinate first (as in the latter), but when
interacting directly with numpy, it must place the =y= coordinate first.
/Usually/ =x= goes first. In any case, the choice being made is very clearly
documented, so when in doubt, pay attention to the docs.

When computing gradients mrcal places the dependent variables in the leading
dimensions, and the independent variables in the trailing dimensions. So if we
have $\vec b = R \vec a$, then

\[ R = \frac{ \partial \vec b }{ \partial \vec a } =
   \left[ \begin{aligned} \frac{ \partial b_0 }{ \partial \vec a } \\
                          \frac{ \partial b_1 }{ \partial \vec a } \\
                          \frac{ \partial b_2 }{ \partial \vec a } \end{aligned} \right] =
   \left[ \frac{ \partial \vec b }{ \partial a_0 } \quad
          \frac{ \partial \vec b }{ \partial a_1 } \quad
          \frac{ \partial \vec b }{ \partial a_2 } \right]
\]

$\frac{ \partial b_i }{ \partial \vec a }$ is a (1,3) row vector and $\frac{
\partial \vec b }{ \partial a_i }$ is a (3,1) column vector.
                          
* Implementation
:PROPERTIES:
:CUSTOM_ID: implementation
:END:

The core of mrcal is written in C, but most of the API is currently available in
Python only. The Python wrapping is done via the [[https://github.com/dkogan/numpysane/blob/master/README-pywrap.org][=numpysane_pywrap=]] library,
which makes it simple to make consistent Python interfaces /and/ provides
[[https://numpy.org/doc/stable/user/basics.broadcasting.html][broadcasting]] support.

The Python layer uses [[https://numpy.org/][=numpy=]] and [[https://github.com/dkogan/numpysane/][=numpysane=]] heavily. All the plotting is done
with [[https://github.com/dkogan/gnuplotlib][=gnuplotlib=]]. [[https://opencv.org/][OpenCV]] is used a bit, but /only/ in the Python layer, less
and less over time (their C APIs are gone, and the C++ APIs are unstable).