1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
|
#+TITLE: Dense stereo processing
#+OPTIONS: toc:t
* Overview
The [[file:tour-stereo.org][tour of mrcal]] shows an example of dense stereo processing done with mrcal;
details about that computation appear here.
Given a pair of calibrated (both intrinsics and extrinsics) cameras, mrcal can
perform stereo processing to produce a dense stereo map. This is relatively
slow, and is often overkill for what is actually needed. But sometimes it is
useful, and the resulting depth images look /really/ nice.
On a high level, mrcal stereo processing is the usual [[https://en.wikipedia.org/wiki/Epipolar_geometry][epipolar geometry]]
technique:
1. Ingest
- Two camera models, each containing the intrinsics /and/ extrinsics (the
relative pose between the two cameras)
- A pair of images captured by these two cameras
2. Compute a "rectified" system: a pair of models where each corresponding row
of pixels in the two cameras all represent observation rays that lie in the
same /epipolar/ plane
3. Reproject the images to these rectified models to produce /rectified/ images
4. Perform "stereo matching". For each pixel in the left rectified image we try
to find the corresponding pixel in the same row of the right rectified image.
The difference in columns is written to a /disparity/ image. This is the most
computationally-intensive part of the process.
5. Convert the /disparity/ image to a /range/ image using the geometry defined
by the rectified system
The epipolar constraint (all pixels in the same row in both rectified images
represent the same plane in space) allows for one-dimensional stereo matching,
which is a massive computational win over the two-dimensional matching that
would be required with another formulation.
The rectified coordinate system looks like this:
[[file:figures/rectification.svg]]
The code and documentatio refers to two angles:
- $\theta$: the "azimuth"; the lateral angle inside the epipolar plane. Related
directly to the $x$ pixel coordinate in the rectified images
- $\phi$: the "elevation"; the tilt of the epipolar plane. Related directly to
the $y$ pixel coordinate in the rectified images
* Rectification models
:PROPERTIES:
:CUSTOM_ID: stereo-rectification-models
:END:
A rectified system satisfies the epipolar constraint (see above). mrcal supports
two models that can have this property, selected with the =rectification_model=
argument to [[file:mrcal-python-api-reference.html#-rectified_system][=mrcal.rectified_system()=]] or with the =--rectification= commandline
argument to [[file:mrcal-stereo.html][=mrcal-stereo=]].
- [[file:lensmodels.org::#lensmodel-pinhole][=LENSMODEL_PINHOLE=]]: this is the traditional rectification model, used in most
existing tools. It works decently well for small fields of view (as with a
long lens), but fails with large fields of view (as with a wide lens). The
issues stem from the uneven angular resolution across the image, which shoots
out to $\infty \frac{\mathrm{pixels}}{\mathrm{deg}}$ as $\theta \rightarrow
\pm 90^\circ$. This produces highly distorted rectified images, which affects
stereo matching adversely, since areas of disparate resolution are being
compared. This is supported by mrcal purely for compatibility with other
tools; there's little reason to use this representation otherwise
- [[file:lensmodels.org::#lensmodel-latlon][=LENSMODEL_LATLON=]]: this is a "transverse equirectangular projection". It is
defined with even angle spacing in both directions, so $x - x_0 = k_x \theta$
and $y - y_0 = k_y \phi$ where $x$ and $y$ are pixel coordinates in the
rectified images, $x_0$ and $y_0$ are the centers of projection of the
rectified system and $k_x$ and $k_y$ are the angular resolution in the two
directions. This is the recommended rectification model, and is the default in
mrcal
Let's demonstrate the two rectification models. In the [[file:tour-stereo.org][tour of mrcal]] we showed a
dense stereo processing sequence. Let's re-rectify those same images and models
with =LENSMODEL_PINHOLE= and =LENSMODEL_LATLON=. This is a demo, so I ask for
the same pixels/angle resolution at the center of the image in both cases, and
demonstrate how this affects the resolution at other parts of the image.
#+begin_src sh
for model (LENSMODEL_PINHOLE LENSMODEL_LATLON) {
mrcal-stereo \
--az-fov-deg 160 \
--el-fov-deg 140 \
--pixels-per-deg -0.05 \
--rectification $model \
[01].cameramodel \
[01].jpg
}
#+end_src
#+begin_src sh :exports none :eval no-export
D=~/projects/mrcal-doc-external/2022-11-05--dtla-overpass--samyang--alpha7/stereo
Dout=~/projects/mrcal-doc-external/figures/stereo
mkdir -p $Dout
for model (LENSMODEL_PINHOLE LENSMODEL_LATLON) {
$PYTHONPATH/mrcal-stereo \
--az-fov-deg 160 \
--el-fov-deg 140 \
--pixels-per-deg -0.05 \
--rectification $model \
--outdir /tmp \
--force \
$D/[01].cameramodel \
$D/[01].jpg
mv /tmp/0-rectified.png $Dout/rectified-demo-lowres-${${model/LENSMODEL_/}:l}.png
}
#+end_src
The left image rectified with =LENSMODEL_LATLON= (resolution at the center is
1/20 of the resolution of the original image):
[[file:external/figures/stereo/rectified-demo-lowres-latlon.png]]
The left image rectified with =LENSMODEL_PINHOLE= (resolution at the center is
still 1/20 of the resolution of the original image):
[[file:external/figures/stereo/rectified-demo-lowres-pinhole.png]]
These are the actual, unscaled rectified images. Note the identical resolution
at the center of the image. And note how =LENSMODEL_PINHOLE= rectification
causes the image to expand dramatically as we approach the edges.
Using =LENSMODEL_PINHOLE= with wide lenses like this introduces an unwinnable
trade-off. If you choose the pixels/angle resolution to keep all your
information at the center, you'll get a huge image, with low-information pixels
at the edges. But if you want to reduce this expansion by lowering the
resolution, you'll lose data at the center. Or you can cut off data at the
edges. No matter what you do, you either lose information or you're stuck with a
huge image.
And even if this was all fine, with =LENSMODEL_PINHOLE= the stereo-matching
algorithm has to match image patches with widely different resolutions.
=LENSMODEL_LATLON= solves all these issues with no down sides, and is thus the
recommended rectification function.
* Interfaces
Currently stereo processing is available via the [[file:mrcal-stereo.html][=mrcal-stereo=]] tool. This tool
implements the usual stereo processing for a single frame.
More complex usages are available via the Python APIs and the C functions in
[[https://www.github.com/dkogan/mrcal/blob/master/stereo.h][=stereo.h=]]. A sequence of images captured with a stereo pair can be processed
like this:
1. [[file:mrcal-python-api-reference.html#-rectified_system][=mrcal.rectified_system()=]] to construct the rectified system defined by the
stereo pair
2. [[file:mrcal-python-api-reference.html#-rectification_maps][=mrcal.rectification_maps()=]] to construct the pixel mappings needed to
transform captured images into rectified images. This is relatively slow, but
it depends on the relative stereo geometry only, so this can be computed
once, and applied to /all/ the subsequent images captured by the stereo pair
3. For each pair of captured images
- [[file:mrcal-python-api-reference.html#-transform_image][=mrcal.transform_image()=]] to generate rectified images
- stereo matching to compute disparities. mrcal does not provide its own
method, and the [[file:mrcal-stereo.html][=mrcal-stereo=]] tool uses the [[https://docs.opencv.org/4.5.3/d2/d85/classcv_1_1StereoSGBM.html][OpenCV SGBM stereo matcher]].
Any stereo matcher can be used. The result is a /disparity/ image, where
each pixel in the first rectified image is mapped to a corresponding pixel
offset from the same feature in the second rectified image
- [[file:mrcal-python-api-reference.html#-stereo_range][=mrcal.stereo_range()=]] to convert the disparities to ranges, which can then
be used to produce a point cloud
A demo of the process if shown in the [[file:tour-stereo.org][tour of mrcal]].
|