1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344
|
/**
\page tutorial-tracking-megapose Tutorial: Tracking with MegaPose
\tableofcontents
\section megapose_tracking_intro Introduction
In this tutorial, we will explore how to use MegaPose \cite Labbe2022Megapose, a deep learning method for 6D object pose estimation.
To know more about MegaPose see <https://megapose6d.github.io/>.
Given:
- An RGB or RGB-D image for which the intrinsics of the camera \f$c\f$ are known
- A coarse detection of the image region in which lies the object
- A 3D model of the object \f$o\f$
MegaPose can estimate the pose of the object relative to the camera frame \f$^{c}\mathbf{T}_{o}\f$.
The method has several advantages:
- Robust estimation in the presence of occlusions and lighting artifacts
- Can work with a coarse model of the object
- Does not require retraining for novel objects
It has however, several drawbacks:
- Running MegaPose requires a GPU. However, the integration in ViSP is based on a client-server model: MegaPose can thus run on a remote machine and its result retrieved on the local host (e.g, a computer with a CPU connected to a robot)
- It may be too slow for your requirements
- With the default parameters, on a 640 x 480 image, initial pose estimation takes around 2 seconds on an Nvidia Quadro RTX 6000
- On the same setup, a pose update (refinement) iteration takes around 60-70 milliseconds
- To perform the initial pose estimation, MegaPose requires an estimate of the image region containing the image (i.e., a bounding box detection).
You may thus require a way to detect the object, such as an object detection neural network (available in ViSP with the class vpDetectorDNNOpenCV, see \ref tutorial-detection-dnn).
For initial tests, the bounding box can also be provided by the user via click.
To see some results, scroll to the the end of this tutorial.
For the 3D model and detection inputs required by megapose, we provide tutorials to help you get setup. See \ref tutorial-megapose-model for the 3D model creation and \ref tutorial-synthetic-blenderproc to train a detection network.
With these tutorials and the tools presented therein, the work to use megapose can be almost fully automated as summed up in the figure below:
\image html tutorial/tracking/megapose/megapose_pipeline.png
The MegaPose integration in ViSP is based on a client-server model:
- The client, that uses either vpMegaPose or vpMegaPoseTracker, is C++-based. It sends pose estimation requests to the server.
- The server is written in Python. It wraps around the MegaPose model. Each time a pose estimation is requested, the server reshapes the data and forwards it to MegaPose.
It then sends back the information to the client.
\note The computer running the server needs a GPU. The client can run on the same computer as the server. It can also run on another computer without a GPU.
To obtain have a decent tracking speed, it is recommended to have both machines on the same network.
This tutorial will explain how to install and run MegaPose and
then demonstrate its usage with a simple object tracking application.
\section megapose_install Installation
\subsection megapose_cpp_install Installing the client
The MegaPose client, written in C++, is included directly in ViSP. It can be installed on any computer, even without a GPU. To be installed and compiled, it requires:
- That ViSP be compiled with the JSON third-party library, as JSON is used to pass messages. To install the 3rd party, see \ref soft_tool_json installation procedure for your system.
Don't forget to build ViSP again after installing JSON third-party.
- Once done, ViSP should be compiled with the `visp_dnn_tracker` module. When generating build files with CMake, it will be built by default if the JSON third-party is detected on your system
- To check that it is installed, you can check the `ViSP-third-party.txt` file that is generated by CMake:
\code{.sh}
$ cd $VISP_WS/visp-build
$ grep "To be built" ViSP-third-party.txt
To be built: core dnn_tracker gui imgproc io java_bindings_generator klt me sensor ar blob robot visual_features vs vision detection mbt tt tt_mi
\endcode
If `"dnn_tracker"` is in the list, then the client can be compiled and used.
- Otherwise it means that ViSP is not built with JSON third-party:
\code{.sh}
$ cd $VISP_WS/visp-build
$ grep "json" ViSP-third-party.txt
Use json (nlohmann): no
\endcode
As explained previously, see \ref soft_tool_json installation procedure and build again ViSP.
\subsection megapose_server_install Installing the server
\warning The megapose server cannot directly be installed and used on Windows. A workaround is to install it in a <a href="https://learn.microsoft.com/en-us/windows/wsl/install">WSL container</a>. A WSL container works as a Linux (Ubuntu) distribution. The client still works on Windows, and WSL allows for port forwarding, making its usage seamless from the perspective of the client.
MegaPose server should be installed on a computer equipped with a GPU. To install the MegaPose server, there are two dependencies:
- Conda: MegaPose will be installed in a new virtual environment in order to avoid potential conflicts with python and other packages you have already installed
- To install conda on your system, we recommend `miniconda`, a minimal version of conda. To install, see <a href="https://docs.conda.io/en/latest/miniconda.html">the miniconda documentation</a>
- Once installed, make sure that conda is in your environment path variable. The conda installation procedure should do this by default.
- To check, simply enter `conda --version` in your terminal.
- You should obtain an output similar to:
\code
$ conda --version
conda 23.3.1
\endcode
- Git is also required in order to fetch the MegaPose sources.
If you built ViSP from sources, then it should already be installed.
The server sources are located in the `$VISP_WS/visp/script/megapose_server` folder of your ViSP <b>source</b> directory.
In this folder, you can find multiple files:
- `run.py`: the code for the server
- `install.py`: the installation script
- `megapose_variables.json`: configuration variables, used in the installation process.
To start the installation process, you should first set the variables in `megapose_variables.json` file:
- `environment`: name of the conda environment that will be created. By default, the environment name is set to `"megapose"`. The MegaPose server will be installed in this environment and it should thus be activated before trying to start the server.
For example, if you set this variable to "visp_megapose_server", then you can activate it with: \code{.sh} $ conda activate visp_megapose_server \endcode
- `megapose_dir`: the folder where MegaPose will be installed. By default, the installation folder is set to `"./megapose6d"`
- `megapose_data_dir`: the folder where the MegaPose deep learning models will be downloaded. By default, the data will be downloaded in `"megapose"` folder.
Once you have configured these variables: run the installation script with:
\code{.sh}
$ cd $VISP_WS/visp/script/megapose_server
$ python install.py
\endcode
The script may run for a few minutes, as it downloads all the dependencies as well as the deep learning models that MegaPose requires.
Once the script has finished, you can check the installation status with the following commands where `<name_of_your_environment>`
could be replaced by `megapose` if you didn't change the content of `megapose_variables.json` file:
\code{.sh}
$ conda activate <name_of_your_environment>
$ python -m megapose_server.run -h
\endcode
The `-h` argument should print some documentation on the arguments that can be passed to the server.
With MegaPose installed, you are now ready to run a basic, single object tracking example.
\section megapose_run Single object tracking with MegaPose
In this tutorial, we will track an object from a live camera feed. For MegaPose to work, we will need:
- The 3D model of the object
- A way to detect the object in the image
- A machine with a GPU, that hosts the server. If your machine has a GPU, then you can run the server and this client in parallel.
To get you started, we provide the full data to run tracking on a short video.
To go further, you should check \ref megapose_adaptation that will explain what you need to use your own objects and camera.
\subsection megapose_start_server Starting the server
To use MegaPose, we first need to start the inference server. As we have installed the server in \ref megapose_server_install, we can now use it from anywhere.
First, activate your conda environment:
\code{.sh}
$ conda activate megapose
\endcode
where `megapose` is the name of the conda environment that you have defined in `megapose_variables.json` file when installing the server.
We can now start the server, and examine its arguments with:
\code{.sh}
(megapose) $ python -m megapose_server.run -h
...
usage: run.py [-h] [--host HOST] [--port PORT]
[--model {RGB,RGBD,RGB-multi-hypothesis,RGBD-multi-hypothesis}]
[--meshes-directory MESHES_DIRECTORY] [--optimize]
[--num_workers NUM_WORKERS]
optional arguments:
-h, --help show this help message and exit
--host HOST IP or hostname to bind the server to. Set to 0.0.0.0 if
you wish to listen for incoming connections from any
source (dangerous)
--port PORT The port on which to listen for new connections
--model {RGB,RGBD,RGB-multi-hypothesis,RGBD-multi-hypothesis}
Which MegaPose model to use. Some models require the depth
map. Some models generate multiple hypotheses when
estimating the pose, at the cost of more computation.
Options: RGB, RGBD, RGB-multi-hypothesis, RGBD-multi-
hypothesis
--meshes-directory MESHES_DIRECTORY
Directory containing the 3D models. each 3D model must be
in its own subfolder
--optimize Experimental: Optimize network for inference speed.
This may incur a loss of accuracy.
--num-workers NUM_WORKERS
Number of workers for rendering
\endcode
From the multiple arguments described, the required ones are:
- `--host`: the IP address on which the server will listen. If you plan to run the tracking example and the MegaPose server on the same machine, use `127.0.0.1`.
If running on separate machines, you can find out the IP address of the server with:
- On Linux (with the `net-tools` package)
\code{.sh}
$ ifconfig
\endcode
and look for the `inet` field of the network interface that can be reached by the client.
- On Windows
\code{.sh}
C:\> ipconfig /all
\endcode
- `--port`: The port on which the server will listen for incoming connections. This port should not already be in use by another program
- `--model`: The model that is used to estimate the pose. The available options are:
- `RGB`: This model expects an RGB image as an input. From the coarse model estimates, the best pose hypothesis is given to the refiner, which performs 5 iterations by default.
- `RGBD`: Same as above, except that an RGBD image is expected in input. Using RGBD is not recommended for tracking applications, as the model is sensitive to depth noise.
- `RGB-multi-hypothesis`: Same as `RGB`, except that the coarse model selects the top-K hypotheses (Here, K = 5) which are all forwarded to the refiner model.
This model will take far more time, and is thus not recommended for tracking, but may be useful for single shot pose estimation if you have no speed requirements.
- `RGBD-multi-hypothesis`: Is similar to `RGB-multi-hypothesis`, except that ICP after the refiner model has run on RGB images. This model thus requires an RGBD image.
- `--meshes-directory`: The directory containing the 3D models. The supported format are `.obj`, `.gltf` and `.glb`. If your model is in another format, e.g., `.stl`, it can be converted through <a href="https://www.blender.org/">Blender</a>
The directory containing the models should be structured as follow:
\code
models
|--cube
|--cube.obj
|--cube.mtl
|--texture.jpg
|--my_obj
|--object.glb
\endcode
In the example above, if we start the server with the `meshes-directory` set to `models`, two objects should be recognized: `cube` and `my_obj`. The name of an object is dictated by its folder name.
To run the basic version of the tutorial below, we provide the model of the cube that is to be tracked in the video. The 3D models directory is `data/model`, located in the tutorial folder.
To start the server, you should enter in your terminal:
\code{.sh}
(megapose) $ cd $VISP_WS/visp-build/tutorial/tracking/dnn
(megapose) $ python -m megapose_server.run --host 127.0.0.1 --port 5555 --model RGB --meshes-directory data/models
\endcode
Note that this assumes that your current directory is the tutorial folder.
\warning If you are running on Windows through WSL, you may encounter an error mentioning that a CUDA/CUDNN-related .so file is not found. To resolve this issue, enter \code export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH\endcode before starting the server.
Your server should now be started and waiting for incoming connections. You can now launch the tracking tutorial.
\subsection megapose_run_command Running the tracking example
Let us now run the tracker on a video, with the default provided cube model.
The video can be found in the data folder of the tutorial and the source code in tutorial-megapose-live-single-object-tracking.cpp located in `$VISP_WS/visp/tutorial/tracking/dnn`.
The program accepts many arguments, defined here through a vpJsonArgumentParser:
\snippet tutorial-megapose-live-single-object-tracking.cpp Arguments
Since there are many arguments, we provide a default configuration to run on the video of the cube. This configuration is found in the file `$VISP_WS/visp/tutorial/tracking/dnn/data/megapose_cube.json`:
\include megapose_cube.json
Among the argument, the most interesting ones are:
- width, height: the dimensions of the image.
- video-device: The source of the images. Input 0,1,2,... etc for a realtime camera feed, or the name of a video file.
- camera: The intrinsics of the camera. Here, the video is captured on an Intel Realsense D435, and the intrinsics are obtained from the realsense SDK. The video is captured by using the tutorial \ref grabber-camera-realsense.
- reinitThreshold: a threshold between 0 and 1. If the MegaPose's score is below this threshold, it should be reinitialised (requiring a 2D bounding box).
- detectionMethod: How to acquire a bounding box of the object in the image.
- object: name of the object to track. Should match an object that is in the mesh directory of the MegaPose server.
- megapose/address: The IP of the MegaPose server.
- megapose/refinerIterations: Number of iterations performed by the refiner model. This impacts both (re)initialization and tracking. Values above 1 may be too slow for tracking.
- megapsose/initialisationNumSamples: Number of renders (random poses) used for the initialisation.
For the parameters of the detector (used if `detectionMethod == dnn`), see \ref tutorial-detection-dnn. Here, the parameters correspond to a YoloV7-tiny, trained only to detect the cube.
Note that to train this detector, we acquired ~400 images with \ref grabber-camera-realsense, then annotated them with <a href="https://github.com/heartexlabs/labelImg">labelImg</a>.
A more recent alternative seems to be <a href="https://github.com/heartexlabs/label-studio">LabelStudio</a>.
The detector should be trained (and exported) with images of the same size as provided to MegaPose.
To launch the tracking program, enter:
\code{.sh}
$ cd $VISP_WS/visp-build/tutorial/tracking/dnn
$ ./tutorial-megapose-live-single-object-tracking --config data/megapose_cube.json megapose/address 127.0.0.1 megapose/port 5555 video-device data/cube_video.mp4
\endcode
If the MegaPose server is running on another machine or uses another port, replace the arguments with your values.
If everything goes well, you should obtain results similar to those displayed below:
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/X5VdIjl5Lo0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly
In this visualization, you can see the 3D model being displayed, as well as the object frame expressed in the camera.
The model display can be toggled by pressing T. Displaying can be helpful in two ways:
- Visually ensuring that tracking produces coherent results
- Verifying that the model is correctly interpreted by megapose
The bar at the bottom displays the score coming from megapose.
This score reflects whether the tracking has diverged and a reinitialization is required.
\subsection megapose_code Understanding the program
We will now go through the code to understand how MegaPose can be called.
The full code can be found in tutorial-megapose-live-single-object-tracking.cpp
After parsing the parameters given by the user (see above), we create a connection to the MegaPose server:
\snippet tutorial-megapose-live-single-object-tracking.cpp Instantiate megapose
We first create the raw vpMegaPose object, passing as parameters the IP adress and the port, as well as the camera calibration and image resolution.
This class can directly be used to perform pose estimation, but we will here prefer the vpMegaPoseTracker class,
which provides a simpler interface in the case of tracking. In addition, it allows to call MegaPose asynchronously and we can then use the main thread to perform other operations, such as acquiring and displaying the latest frame.
To the tracker, we provide the name of the object we wish to track, as well as the number of iterations that MegaPose should perform. Run time will scale linearly in the number of iterations.
Once our tracker is initialized, we set the number of samples for coarse pose estimation (when we provide a bounding box detection, but no previous pose estimate).
We also check that the object's name is known to MegaPose. If it is not, then tracking will not be possible.
Finally, we initialize a reference to a <a href="https://en.cppreference.com/w/cpp/thread/future">future</a> object, which will store the latest pose estimation result.
We can now enter a loop which will start by acquiring the latest image from the camera:
\snippet tutorial-megapose-live-single-object-tracking.cpp Acquisition
Once we have acquired an image, we continue by checking MegaPose has returned a result. Of course, this will not be the case for the first iteration.
If there is indeed a new result, we can check the confidence score to decide if a reinitialization is required and request the rendering from MegaPose to display it afterwards.
In addition, we also request a new pose estimation, by setting the `callMegapose` boolean to true.
\snippet tutorial-megapose-live-single-object-tracking.cpp Check megapose
When requesting a pose estimate, there are two states to handle:
- We are not already tracking, or tracking has failed. In this case, we require the 2D bounding box of the object to (re)initialize tracking. To perform detection, we provide two methods
- The first, where a trained detection neural network (`detectionMethod` == "dnn") performs the bounding box regression.
- The second, ideal for initial tests, where the user provides the detection.
Note that in both cases, the methods (described after) return an optional value: the object may not always be visible in the image.
- In the second case, we are already tracking: We can simply feed the latest image to MegaPose as we already have an estimate of the object pose.
\snippet tutorial-megapose-live-single-object-tracking.cpp Call MegaPose
To provide a bounding box to megapose, the code of the two methods can be found below and is fairly straightforward
\snippet tutorial-megapose-live-single-object-tracking.cpp Detect
Once MegaPose has been called, we can display the results in the image. We plot:
- The object pose, expressed in the camera frame
- The 3D render as seen by MegaPose, overlayed on the actual image
- The confidence score of MegaPose
\snippet tutorial-megapose-live-single-object-tracking.cpp Display
We have walked through the code of a single object tracking with MegaPose.
You may wish to save the results. You can do so by serializing to JSON, as explained in \ref tutorial-json
\section megapose_adaptation Adapting this tutorial for your use case
This program can run with new 3D models, as MegaPose does require retraining. To adapt this script to your problem, you will require multiple things:
- The intrinsic parameters of your camera. To calibrate your camera, see \ref tutorial-calibration-intrinsic
- The 3D model of your object. See \ref tutorial-megapose-model
- Optionally (but recommended), an automated way to detect the object. You can for instance train a deep neural network and use it in ViSP, as explained in \ref tutorial-detection-dnn. Since you already have your 3D model, you can use Blender to generate a synthetic dataset and train a detection network without manually annotating images. This process is explained in \ref tutorial-synthetic-blenderproc.
This tracking example has been used to illustrate some of MegaPose's properties.
First, combining it with a deep learning detection method provides an automatic tracking initialization/reinitialization method.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://www.youtube.com/embed/kD2U3n9S0ww" title="Object tracking with MegaPose: automatic reinitialisation with a YoloV7" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></p>
\endhtmlonly
Second, It is able to track reconstructed meshes and is resistant to occlusions as seen below.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/MrLiDORtnwA" title="Pose estimation of a reconstructed 3D model with megapose" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly
It is also resistant to lighting variations and can track textureless objects.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/QwB3mTpNEo8" title="Pose Estimation of a textureless object" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly
Finally, MegaPose is an ideal candidate for Pose-Based Visual Servoing. The video below shows an example of a PBVS experiment where MegaPose provides the pose estimation that is given as input to the PBVS control law. See \ref megapose_next_steps for more information.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/jwUJySk9Kew" title="Pose Estimation of a textureless object" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly
\subsection megapose_next_steps Next steps
To go further, you can look at an example of visual servoing using Megapose, available at \ref servoAfma6MegaposePBVS.cpp
*/
|