File: tutorial-tracking-megapose.dox

package info (click to toggle)
visp 3.6.0-5
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 119,296 kB
sloc: cpp: 500,914; ansic: 52,904; xml: 22,642; python: 7,365; java: 4,247; sh: 482; makefile: 237; objc: 145
file content (344 lines) | stat: -rw-r--r-- 23,806 bytes
/**

\page tutorial-tracking-megapose Tutorial: Tracking with MegaPose
\tableofcontents

\section megapose_tracking_intro Introduction

In this tutorial, we will explore how to use MegaPose \cite Labbe2022Megapose, a deep learning method for 6D object pose estimation.
To know more about MegaPose see <https://megapose6d.github.io/>.

Given:
  - An RGB or RGB-D image for which the intrinsics of the camera \f$c\f$ are known
  - A coarse detection of the image region in which lies the object
  - A 3D model of the object \f$o\f$
MegaPose can estimate the pose of the object relative to the camera frame \f$^{c}\mathbf{T}_{o}\f$.

The method has several advantages:
  - Robust estimation in the presence of occlusions and lighting artifacts
  - Can work with a coarse model of the object
  - Does not require retraining for novel objects

It has however, several drawbacks:
  - Running MegaPose requires a GPU. However, the integration in ViSP is based on a client-server model: MegaPose can thus run on a remote machine and its result retrieved on the local host (e.g, a computer with a CPU connected to a robot)
  - It may be too slow for your requirements
    - With the default parameters, on a 640 x 480 image, initial pose estimation takes around 2 seconds on an Nvidia Quadro RTX 6000
    - On the same setup, a pose update (refinement) iteration takes around 60-70 milliseconds
  - To perform the initial pose estimation, MegaPose requires an estimate of the image region containing the image (i.e., a bounding box detection).
    You may thus require a way to detect the object, such as an object detection neural network (available in ViSP with the class vpDetectorDNNOpenCV, see \ref tutorial-detection-dnn).
    For initial tests, the bounding box can also be provided by the user via click.

To see some results, scroll to the the end of this tutorial.

For the 3D model and detection inputs required by megapose, we provide tutorials to help you get setup. See \ref tutorial-megapose-model for the 3D model creation and \ref tutorial-synthetic-blenderproc to train a detection network.
With these tutorials and the tools presented therein, the work to use megapose can be almost fully automated as summed up in the figure below:

\image html tutorial/tracking/megapose/megapose_pipeline.png

The MegaPose integration in ViSP is based on a client-server model:
- The client, that uses either vpMegaPose or vpMegaPoseTracker, is C++-based. It sends pose estimation requests to the server.
- The server is written in Python. It wraps around the MegaPose model. Each time a pose estimation is requested, the server reshapes the data and forwards it to MegaPose.
  It then sends back the information to the client.

\note The computer running the server needs a GPU. The client can run on the same computer as the server. It can also run on another computer without a GPU.
To obtain have a decent tracking speed, it is recommended to have both machines on the same network.

This tutorial will explain how to install and run MegaPose and
then demonstrate its usage with a simple object tracking application.

\section megapose_install Installation

\subsection megapose_cpp_install Installing the client

The MegaPose client, written in C++, is included directly in ViSP. It can be installed on any computer, even without a GPU. To be installed and compiled, it requires:
- That ViSP be compiled with the JSON third-party library, as JSON is used to pass messages. To install the 3rd party, see \ref soft_tool_json installation procedure for your system.
  Don't forget to build ViSP again after installing JSON third-party.
- Once done, ViSP should be compiled with the `visp_dnn_tracker` module. When generating build files with CMake, it will be built by default if the JSON third-party is detected on your system
  - To check that it is installed, you can check the `ViSP-third-party.txt` file that is generated by CMake:
  \code{.sh}
    $ cd $VISP_WS/visp-build
    $ grep "To be built" ViSP-third-party.txt
    To be built: core dnn_tracker gui imgproc io java_bindings_generator klt me sensor ar blob robot visual_features vs vision detection mbt tt tt_mi
  \endcode
  If `"dnn_tracker"` is in the list, then the client can be compiled and used.
  - Otherwise it means that ViSP is not built with JSON third-party:
  \code{.sh}
    $ cd $VISP_WS/visp-build
    $ grep "json" ViSP-third-party.txt
        Use json (nlohmann):         no
  \endcode
  As explained previously, see \ref soft_tool_json installation procedure and build again ViSP.

\subsection megapose_server_install Installing the server

\warning The megapose server cannot directly be installed and used on Windows. A workaround is to install it in a <a href="https://learn.microsoft.com/en-us/windows/wsl/install">WSL container</a>. A WSL container works as a Linux (Ubuntu) distribution. The client still works on Windows, and WSL allows for port forwarding, making its usage seamless from the perspective of the client.

MegaPose server should be installed on a computer equipped with a GPU. To install the MegaPose server, there are two dependencies:
  - Conda: MegaPose will be installed in a new virtual environment in order to avoid potential conflicts with python and other packages you have already installed
    - To install conda on your system, we recommend `miniconda`, a minimal version of conda. To install, see <a href="https://docs.conda.io/en/latest/miniconda.html">the miniconda documentation</a>
    - Once installed, make sure that conda is in your environment path variable. The conda installation procedure should do this by default.
    - To check, simply enter `conda --version` in your terminal.
    - You should obtain an output similar to:
      \code
      $ conda --version
      conda 23.3.1
      \endcode
  - Git is also required in order to fetch the MegaPose sources.
  If you built ViSP from sources, then it should already be installed.

The server sources are located in the `$VISP_WS/visp/script/megapose_server` folder of your ViSP <b>source</b> directory.

In this folder, you can find multiple files:
- `run.py`: the code for the server
- `install.py`: the installation script
- `megapose_variables.json`: configuration variables, used in the installation process.

To start the installation process, you should first set the variables in `megapose_variables.json` file:
- `environment`: name of the conda environment that will be created. By default, the environment name is set to `"megapose"`. The MegaPose server will be installed in this environment and it should thus be activated before trying to start the server.
  For example, if you set this variable to "visp_megapose_server", then you can activate it with: \code{.sh} $ conda activate visp_megapose_server \endcode
- `megapose_dir`: the folder where MegaPose will be installed. By default, the installation folder is set to `"./megapose6d"`
- `megapose_data_dir`: the folder where the MegaPose deep learning models will be downloaded. By default, the data will be downloaded in `"megapose"` folder.


Once you have configured these variables: run the installation script with:
\code{.sh}
$ cd $VISP_WS/visp/script/megapose_server
$ python install.py
\endcode

The script may run for a few minutes, as it downloads all the dependencies as well as the deep learning models that MegaPose requires.

Once the script has finished, you can check the installation status with the following commands where `<name_of_your_environment>`
could be replaced by `megapose` if you didn't change the content of `megapose_variables.json` file:
\code{.sh}
$ conda activate <name_of_your_environment>
$ python -m megapose_server.run -h
\endcode

The `-h` argument should print some documentation on the arguments that can be passed to the server.

With MegaPose installed, you are now ready to run a basic, single object tracking example.

\section megapose_run Single object tracking with MegaPose

In this tutorial, we will track an object from a live camera feed. For MegaPose to work, we will need:
- The 3D model of the object
- A way to detect the object in the image
- A machine with a GPU, that hosts the server. If your machine has a GPU, then you can run the server and this client in parallel.

To get you started, we provide the full data to run tracking on a short video.
To go further, you should check \ref megapose_adaptation that will explain what you need to use your own objects and camera.

\subsection megapose_start_server Starting the server

To use MegaPose, we first need to start the inference server. As we have installed the server in \ref megapose_server_install, we can now use it from anywhere.
First, activate your conda environment:
\code{.sh}
$ conda activate megapose
\endcode
where `megapose` is the name of the conda environment that you have defined in `megapose_variables.json` file when installing the server.

We can now start the server, and examine its arguments with:
\code{.sh}
(megapose) $ python -m megapose_server.run -h
...
usage: run.py [-h] [--host HOST] [--port PORT]
              [--model {RGB,RGBD,RGB-multi-hypothesis,RGBD-multi-hypothesis}]
              [--meshes-directory MESHES_DIRECTORY] [--optimize]
              [--num_workers NUM_WORKERS]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           IP or hostname to bind the server to. Set to 0.0.0.0 if
                        you wish to listen for incoming connections from any
                        source (dangerous)
  --port PORT           The port on which to listen for new connections
  --model {RGB,RGBD,RGB-multi-hypothesis,RGBD-multi-hypothesis}
                        Which MegaPose model to use. Some models require the depth
                        map. Some models generate multiple hypotheses when
                        estimating the pose, at the cost of more computation.
                        Options: RGB, RGBD, RGB-multi-hypothesis, RGBD-multi-
                        hypothesis
  --meshes-directory MESHES_DIRECTORY
                        Directory containing the 3D models. each 3D model must be
                        in its own subfolder
  --optimize            Experimental: Optimize network for inference speed.
                        This may incur a loss of accuracy.
  --num-workers NUM_WORKERS
                        Number of workers for rendering
\endcode

From the multiple arguments described, the required ones are:
- `--host`: the IP address on which the server will listen. If you plan to run the tracking example and the MegaPose server on the same machine, use `127.0.0.1`.
If running on separate machines, you can find out the IP address of the server with:
  - On Linux (with the `net-tools` package)
  \code{.sh}
    $ ifconfig
  \endcode
  and look for the `inet` field of the network interface that can be reached by the client.
  - On Windows
  \code{.sh}
    C:\> ipconfig /all
  \endcode
- `--port`: The port on which the server will listen for incoming connections. This port should not already be in use by another program
- `--model`: The model that is used to estimate the pose. The available options are:
  - `RGB`: This model expects an RGB image as an input. From the coarse model estimates, the best pose hypothesis is given to the refiner, which performs 5 iterations by default.
  - `RGBD`: Same as above, except that an RGBD image is expected in input. Using RGBD is not recommended for tracking applications, as the model is sensitive to depth noise.
  - `RGB-multi-hypothesis`: Same as `RGB`, except that the coarse model selects the top-K hypotheses (Here, K = 5) which are all forwarded to the refiner model.
  This model will take far more time, and is thus not recommended for tracking, but may be useful for single shot pose estimation if you have no speed requirements.
  - `RGBD-multi-hypothesis`: Is similar to `RGB-multi-hypothesis`, except that ICP after the refiner model has run on RGB images. This model thus requires an RGBD image.
- `--meshes-directory`:  The directory containing the 3D models. The supported format are `.obj`, `.gltf` and `.glb`. If your model is in another format, e.g., `.stl`, it can be converted through <a href="https://www.blender.org/">Blender</a>
The directory containing the models should be structured as follow:
\code
models
|--cube
   |--cube.obj
   |--cube.mtl
   |--texture.jpg
|--my_obj
   |--object.glb
\endcode
In the example above, if we start the server with the `meshes-directory` set to `models`, two objects should be recognized: `cube` and `my_obj`. The name of an object is dictated by its folder name.

To run the basic version of the tutorial below, we provide the model of the cube that is to be tracked in the video. The 3D models directory is `data/model`, located in the tutorial folder.
To start the server, you should enter in your terminal:
\code{.sh}
(megapose) $ cd $VISP_WS/visp-build/tutorial/tracking/dnn
(megapose) $ python -m megapose_server.run --host 127.0.0.1 --port 5555 --model RGB --meshes-directory data/models
\endcode
Note that this assumes that your current directory is the tutorial folder.

\warning If you are running on Windows through WSL, you may encounter an error mentioning that a CUDA/CUDNN-related .so file is not found. To resolve this issue, enter \code export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH\endcode before starting the server.

Your server should now be started and waiting for incoming connections. You can now launch the tracking tutorial.

\subsection megapose_run_command Running the tracking example
Let us now run the tracker on a video, with the default provided cube model.
The video can be found in the data folder of the tutorial and the source code in tutorial-megapose-live-single-object-tracking.cpp located in `$VISP_WS/visp/tutorial/tracking/dnn`.

The program accepts many arguments, defined here through a vpJsonArgumentParser:
\snippet tutorial-megapose-live-single-object-tracking.cpp Arguments

Since there are many arguments, we provide a default configuration to run on the video of the cube. This configuration is found in the file `$VISP_WS/visp/tutorial/tracking/dnn/data/megapose_cube.json`:
\include megapose_cube.json

Among the argument, the most interesting ones are:
- width, height: the dimensions of the image.
- video-device: The source of the images. Input 0,1,2,... etc for a realtime camera feed, or the name of a video file.
- camera: The intrinsics of the camera. Here, the video is captured on an Intel Realsense D435, and the intrinsics are obtained from the realsense SDK. The video is captured by using the tutorial \ref grabber-camera-realsense.
- reinitThreshold: a threshold between 0 and 1. If the MegaPose's score  is below this threshold, it should be reinitialised (requiring a 2D bounding box).
- detectionMethod: How to acquire a bounding box of the object in the image.
- object: name of the object to track. Should match an object that is in the mesh directory of the MegaPose server.
- megapose/address: The IP of the MegaPose server.
- megapose/refinerIterations: Number of iterations performed by the refiner model. This impacts both (re)initialization and tracking. Values above 1 may be too slow for tracking.
- megapsose/initialisationNumSamples: Number of renders (random poses) used for the initialisation.

For the parameters of the detector (used if `detectionMethod == dnn`), see \ref tutorial-detection-dnn. Here, the parameters correspond to a YoloV7-tiny, trained only to detect the cube.
Note that to train this detector, we acquired ~400 images with \ref grabber-camera-realsense, then annotated them with <a href="https://github.com/heartexlabs/labelImg">labelImg</a>.
A more recent alternative seems to be <a href="https://github.com/heartexlabs/label-studio">LabelStudio</a>.
The detector should be trained (and exported) with images of the same size as provided to MegaPose.

To launch the tracking program, enter:
\code{.sh}
$ cd $VISP_WS/visp-build/tutorial/tracking/dnn
$ ./tutorial-megapose-live-single-object-tracking --config data/megapose_cube.json megapose/address 127.0.0.1 megapose/port 5555 video-device data/cube_video.mp4
\endcode

If the MegaPose server is running on another machine or uses another port, replace the arguments with your values.

If everything goes well, you should obtain results similar to those displayed below:

\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/X5VdIjl5Lo0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly

In this visualization, you can see the 3D model being displayed, as well as the object frame expressed in the camera.
The model display can be toggled by pressing T. Displaying can be helpful in two ways:
- Visually ensuring that tracking produces coherent results
- Verifying that the model is correctly interpreted by megapose

The bar at the bottom displays the score coming from megapose.
This score reflects whether the tracking has diverged and a reinitialization is required.

\subsection megapose_code Understanding the program

We will now go through the code to understand how MegaPose can be called.
The full code can be found in tutorial-megapose-live-single-object-tracking.cpp

After parsing the parameters given by the user (see above), we create a connection to the MegaPose server:
\snippet tutorial-megapose-live-single-object-tracking.cpp Instantiate megapose
We first create the raw vpMegaPose object, passing as parameters the IP adress and the port, as well as the camera calibration and image resolution.
This class can directly be used to perform pose estimation, but we will here prefer the vpMegaPoseTracker class,
which provides a simpler interface in the case of tracking. In addition, it allows to call MegaPose asynchronously and we can then use the main thread to perform other operations, such as acquiring and displaying the latest frame.

To the tracker, we provide the name of the object we wish to track, as well as the number of iterations that MegaPose should perform. Run time will scale linearly in the number of iterations.

Once our tracker is initialized, we set the number of samples for coarse pose estimation (when we provide a bounding box detection, but no previous pose estimate).

We also check that the object's name is known to MegaPose. If it is not, then tracking will not be possible.

Finally, we initialize a reference to a <a href="https://en.cppreference.com/w/cpp/thread/future">future</a> object, which will store the latest pose estimation result.

We can now enter a loop which will start by acquiring the latest image from the camera:
\snippet tutorial-megapose-live-single-object-tracking.cpp Acquisition

Once we have acquired an image, we continue by checking MegaPose has returned a result. Of course, this will not be the case for the first iteration.
If there is indeed a new result, we can check the confidence score to decide if a reinitialization is required and request the rendering from MegaPose to display it afterwards.
In addition, we also request a new pose estimation, by setting the `callMegapose` boolean to true.
\snippet tutorial-megapose-live-single-object-tracking.cpp Check megapose

When requesting a pose estimate, there are two states to handle:
- We are not already tracking, or tracking has failed. In this case, we require the 2D bounding box of the object to (re)initialize tracking. To perform detection, we provide two methods
 - The first, where a trained detection neural network (`detectionMethod` == "dnn") performs the bounding box regression.
 - The second, ideal for initial tests, where the user provides the detection.
 Note that in both cases, the methods (described after) return an optional value: the object may not always be visible in the image.
- In the second case, we are already tracking: We can simply feed the latest image to MegaPose as we already have an estimate of the object pose.

\snippet tutorial-megapose-live-single-object-tracking.cpp Call MegaPose

To provide a bounding box to megapose, the code of the two methods can be found below and is fairly straightforward
\snippet tutorial-megapose-live-single-object-tracking.cpp Detect

Once MegaPose has been called, we can display the results in the image. We plot:
- The object pose, expressed in the camera frame
- The 3D render as seen by MegaPose, overlayed on the actual image
- The confidence score of MegaPose
\snippet tutorial-megapose-live-single-object-tracking.cpp Display

We have walked through the code of a single object tracking with MegaPose.
You may wish to save the results. You can do so by serializing to JSON, as explained in \ref tutorial-json

\section megapose_adaptation Adapting this tutorial for your use case

This program can run with new 3D models, as MegaPose does require retraining. To adapt this script to your problem, you will require multiple things:
- The intrinsic parameters of your camera. To calibrate your camera, see \ref tutorial-calibration-intrinsic
- The 3D model of your object. See \ref tutorial-megapose-model
- Optionally (but recommended), an automated way to detect the object. You can for instance train a deep neural network and use it in ViSP, as explained in \ref tutorial-detection-dnn. Since you already have your 3D model, you can use Blender to generate a synthetic dataset and train a detection network without manually annotating images. This process is explained in \ref tutorial-synthetic-blenderproc.

This tracking example has been used to illustrate some of MegaPose's properties.
First, combining it with a deep learning detection method provides an automatic tracking initialization/reinitialization method.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://www.youtube.com/embed/kD2U3n9S0ww" title="Object tracking with MegaPose: automatic reinitialisation with a YoloV7" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></p>
\endhtmlonly

Second, It is able to track reconstructed meshes and is resistant to occlusions as seen below.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/MrLiDORtnwA" title="Pose estimation of a reconstructed 3D model with megapose" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly

It is also resistant to lighting variations and can track textureless objects.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/QwB3mTpNEo8" title="Pose Estimation of a textureless object" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly

Finally, MegaPose is an ideal candidate for Pose-Based Visual Servoing. The video below shows an example of a PBVS experiment where MegaPose provides the pose estimation that is given as input to the PBVS control law. See \ref megapose_next_steps for more information.
\htmlonly
<p align="center"><iframe width="560" height="315" src="https://youtube.com/embed/jwUJySk9Kew" title="Pose Estimation of a textureless object" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
\endhtmlonly


\subsection megapose_next_steps Next steps

To go further, you can look at an example of visual servoing using Megapose, available at \ref servoAfma6MegaposePBVS.cpp

*/