1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198
|
/**
\page tutorial-tracking-rbt Tutorial: Tracking with the Render-Based Tracker
\tableofcontents
\section rbt_tracking_intro 1. Introduction
In ViSP 3.7, we are introducing a new Render-Based Tracker (RBT), that leverages rendering on the GPU to extract
geometric information in order to perform online and continuous 6D pose estimation of a known 3D object.
It is an extension of the Model-Based Tracker (MBT) already present in ViSP (see \ref tutorial-tracking-mb-generic).
This implementation is derived from the work of \cite Petit14a.
A major advantage over the Model-Based Tracker is that the RBT uses a common mesh representation for the object to
track. This representation does not require additional preprocessing for feature selection and is widely supported in
software such as Blender.
It can also be obtained through photogrammetry processes, minimizing the work required before using the tracker.
Note that we do not consider deformable objects, only non-articulated rigid objects.
Similar to the MBT, tracking is framed as an optimization problem, where the difference between the features
(geometric or photometric) extracted from a 3D model and those observed in the image must be minimized.
The RBT is built as a modular pipeline, see \ref rbt_tracking_overview. In practice, this means:
- It can leverage both RGB and depth information (see \ref rbt_tracking_requirements)
- It features a complete configuration for every part of the tracking pipeline (see \ref rbt_tracking_config)
- Most components can be disabled or adapted to different use cases and scenarios
- Components can be extended: You can add new features or filtering methods (see \ref rbt_extension).
\section rbt_tracking_requirements 2. Requirements
\subsection rbt_tracking_install_requirements 2.1. Building the RBT module
To successfully build the RBT module, you will need:
- To compile using a minimum standard version of C++11 (defined in the `USE_CXX_STANDARD` CMake variable)
- The Panda3D 3rd-party, which is used for 3D rendering. See \ref tutorial-panda3d-install for more information on how
to install Panda3D on your system. Panda3D is also available through Conda if you are compiling in a virtual
environment.
Additionally, Some optional dependencies are (strongly) recommended:
- `nlohmann::json` (see \ref install_ubuntu_3rdparty_other for installation instructions) to load configuration files
and save your tracking results. Without it, the tracker setup will have to be done through code
- OpenCV, if you wish to use the `vpRBKltTracker`, that uses KLT feature tracking for pose estimation.
\subsection rbt_tracking_requirements_general 2.2. General requirements
To use the RBT you will need several things:
- An OpenGL or DirectX enabled device, capable of performing 3D rendering.
You can use software acceleration, although having a GPU is preferable for performance reasons
- A camera with known intrinsic parameters. For calibration see \ref tutorial-calibration-intrinsic. \warning The camera
intrinsics should follow a model without distortion.
- If you are using an RGB-D camera, the depth image should be aligned with the RGB image. Some SDKs provide this
functionality. For instance, the wrapper around the realsense SDK accepts an "align" parameter in the
vpRealSense2::acquire() function. When correctly set, the alignment will automatically be performed for you.
- A 3D model of the object, in a format that is supported by Panda3D (see \ref tutorial-panda3d-file-conversion).
If you have installed *libassimp-dev* or are using the conda package, Panda3D supports common formats such as
*.ply*, *.obj*. Otherwise, you will have to convert your mesh using the previously linked method.
\subsection rbt_tracking_requirements_3d_model 2.3. 3D model considerations
There are very little restrictions on the 3D model that can be used in the RBT.
First, your 3D model's size should be expressed in meters. Be aware that some CAD software export models in millimeters.
If you are using the initialization by click, you will have to be careful with the model orientation when exporting
(see below).
Note that while 3D meshes support textures, the presently available RBT features do not use the texture information.
To correctly process your model, here is a small overview of the steps to follow in Blender
\subsubsection rbt_tracking_requirements_model_preparation 2.3.1. Preparing your model in Blender
To make sure your model is correct, you should start by setting the scale and orientation of your 3D model.
You should first import your model in Blender. In the top left corner of the Blender window, click on
**File > Import > (Your model type)**, then pick your object in the popup window.
Your model should have been imported and should be visible in the 3D viewer. You should then click on it and press
**N** to bring up the Transform panel in the top right corner. Then you should press **Ctrl+A** to open the **Apply**
menu and click on the **All transforms** item.
When you do this, you will not see any change applied to the model itself,
but this will ensure that what you see in Blender is what is observed by the tracker and that both use the same orientation and scale.
\image html rbt-blender-apply-transforms.png Apply transforms to clear any rotation and scale difference stored in Blender.
If you are using initialization by click as described in \ref mb_generic_init_user section, you can select the 3D
points to click directly in Blender. To do so, first click on your object and press *tab* to go into Edit mode. Then,
press *N* to display the **Transform** Panel (top right of the viewer). Choose a 3D point to use for initialization,
then click on it. In the **Transform** panel, you should then see the XYZ coordinates of the point, which you can copy
into the init file. Be sure to select the "Global" frame in the transform panel.
\image html rbt-click-init-blender-panel.png The transform panel with a selected 3D point in Edit mode.
Once you have selected your points make sure to export your model in the same frame as the Blender frame.
You can export your 3D model, using the **File > Export > (Supported 3D format such as .obj)** and save your model in
the desired location. In the export panel, set "Forward Axis" to **-Z** and "Up Axis" to **Y**.
With all the prerequisites met, you can now understand (if you wish) how the tracker works and how to customize it for
your needs.
\section rbt_tracking_overview 3. Algorithm overview
This section of the tutorial details how the tracker works.
The RBT is used by building a vpRBTracker object, configuring it and then feeding it RGB or RGB-D frames.
As the RBT is a tracking algorithm, an initial pose must be provided. Given an initial pose, the RBT will continuously
track an object in a sequence of frames. This also assumes that the motion between consecutive frame is small enough so
that the tracker can retrieve the features and correctly update the pose of the object in the camera frame.
Otherwise, there is a risk of divergence, leading to a need for reinitialization.
This can be done via click (see \ref mb_generic_init_user section and the vpRBTracker::initClick() method).
Otherwise, you can use a pose estimation algorithm (e.g. Megapose described in \ref tutorial-tracking-megapose and
vpMegaPose::estimatePoses() method) followed by a call to vpRBTracker::setPose().
Roughly the algorithm works as follows:
1. Given an initial pose of the object in the camera frame \f$^{c}\mathbf{T}_{o}\f$, generate a render of the object at
this pose.
2. Extract and process render data, and store it into a vpRBRenderData object.
3. From the renders, seek features to match with in the current RGB and depth images.
4. From the match between image and model features, formulate an error \f$\mathbf{e}\f$.
5. Iteratively minimize \f$\vert\mathbf{e}\vert_2\f$ by updating \f$^{c}\mathbf{T}_{o}\f$
\subsection rbt_tracking_rendering 3.1. Rendering
Object rendering is done through Panda3D. The object is rendered at the last computed pose.
All the information derived from the render is stored into a vpRBRenderData, in the current frame data
(vpRBFeatureTrackerInput). This object contains information such as:
- The 3D surface normals, expressed as 3D unit vectors
- The depth map, in meters (Rendered using vpPanda3DGeometryRenderer)
- The object silhouette (extracted using vpPanda3DDepthCannyFilter and vpRBTracker::extractSilhouettePoints)
- The image area containing the object (represented by a vpRect)
- The pose at which the object was rendered: When implementing new features or algorithms,
it is preferable to use this pose when extracting data from the render, instead of those provided in the functions
to redefine.
\image html rbt-rendering.png Rendering is used to extract geometric information width=50%
For more information on how rendering is performed you can examine the following sources:
- vpRBTracker::updateRender
- vpRBRenderData
- vpObjectCentricRenderer
- vpPanda3DRendererSet
- vpPanda3DBaseRenderer
\subsection rbt_tracking_features 3.2. Trackable features
This section details the different features that can be used to track the object.
The RBT is flexible, and each feature can be added or removed, depending on your scenario and available sensor.
For each feature, a weighting function is computed at every iteration, and a user specified weight can be used to
consider certain features as more or less important than others.
All features inherit from a base class, vpRBFeatureTracker, that defines the different functionalities that should be
implemented for a feature to be trackable.
To add or interact with the features of the RBT, you can use vpRBTracker::addTracker() and
vpRBTracker::getFeatureTrackers().
\subsubsection rbt_tracking_available_features 3.2.1. Feature list
| Feature | Class | Required inputs | Note
| :----- | :----- | :----: | ----:
| KLT 2D points | vpRBKltTracker | Grayscale image | Requires OpenCV
| Silhouette moving edges | vpRBSilhouetteMeTracker | Grayscale image | -
| Silhouette color edges | vpRBSilhouetteCCDTracker | Color image | More expensive than silhouette moving edge based tracker
| Depth map (point-to-plane) | vpRBDenseDepthTracker | Depth image | -
| XFeat keypoints | RBXFeatFeatureTracker | Color image | Only available in Python
- **KLT points** Rely on the vpKltOpencv class to extract and track 2D KLT points from the current image, which are
then associated with points on the 3D model. The error to minimize is then the reprojection error between the 2D
points in the current image and the reprojection of their associated 3D points.
- **Silhouette moving edges** Rely on the rendering to first extract the object depth disparities, which are
hypothesized to create disparities in the luminance image as well. For each point of the depth contour, the most
likely edge in its neighborhood in the luminance map is detected using the moving edge framework.
- **Silhouette Color edges** Similarly to the silhouette moving edges, the depth disparity is first used to extract
the outer silhouette contours. Then, based on the Contracting Curve Density algorithm, color statistics are computed
inside and outside the silhouette and a per pixel error is computed. This error is minimized when the difference
between the inner and outer color distributions are maximized (i.e, there is a clear difference between the object's
color and the environment). Unlike the moving edge approach, the error is not geometric in nature, but rather
photometric.
- **Depth** The depth information can be used and compared to the rendered depth. With this feature, the error to
minimize is the point-to-plane distance. The point is sampled from the current depth map, while the plane is computed
from the rendered 3D model, using its distance to the camera and surface normal. The plane is continuously updated
to minimize the distance to the sample depth point.
- **XFeat keypoints** Robust, deep learning based keypoints. The error criterion is the same as the KLT points,
but unlike KLT, XFeat relies on descriptors that are stored in a map in association to 3D points
These points can be matched even in cases where there are large motions and across time.
To define you own features (advanced), see \ref rbt_extension_features.
While you can use all the different features together here are some common rules of thumb
- KLT features tend to work better when there is some texture information on the model itself. However, they are also worth a try even on models with little texture.
- Avoid using the KLT tracker by itself. Indeed, the 3d points associated to tracked 2D points are estimated from the model at a given timestep.
If this model is even slightly incorrect, this will result in drift over time. This drift can be compensated by the other features that do not have this 2D-3D correspondence step.
- Similarly, the depth features are ill-suited in two scenarios. First, when tracking an object where depth estimation is complex.
This can be the case when using an infrared based depth camera and tracking a black object that reflects very little light.
The second scenario is when tracking movements that are perpendicular to the camera plane with very low depth variation. An example of this is a cube that is fully facing the camera and that is translated left to right.
In this case, depth information should be combined with other features to track.
- Contour-based features are important for correct pose estimation. Color edges tend to recover from larger motions than the moving edge-based features, but may result in more jitter.
In most cases, it is not interesting to combine the two. One case that can be of interest is to use the color silhouette for large motion, and use more point for moving edges to recover the finer motion.
See the configuration section \ref rbt_tracking_config_silhouette and \ref rbt_tracking_config_features_ccd for information on how to accomplish this.
- Note that the color silhouette features only rely on the outer object silhouette, while the moving edges can also capture and leverage the depth disparities inside the object.
If you have strong depth changes on your object, it can be interesting to use the moving edges with a correct depth threshold.
\subsection rbt_tracking_filtering 3.3. Feature filtering
When tracking, it is possible for occlusions or lighting artifacts to appear. In those cases, considering ill-defined
or ill-matched features may lead to tracking failure. To handle those cases and improve tracking reliability, the RBT
comes with two mechanisms:
- A masking/segmentation approach, that outputs a per-pixel probability map that a given pixel belongs to the object
- Robust error criteria. Each feature defined above uses an M-Estimator (through vpRobust) to reweigh the errors to be
minimized and reduce the influence of outliers.
While M-estimators are always used, the masking approach is entirely optional.
To enable masking, you should call vpRBTracker::setObjectSegmentationMethod() with the method that you wish to use.
To disable it, pass it *nullptr* as input argument.
As of now, the only method available is vpColorHistogramMask.
This method computes time-smoothed histograms of the object and background's color distributions,
and compares their output probabilities for a given pixel color to compute an object membership score.
If available, this method may use camera depth information to update the object histograms using only reliable pixels
in terms of depth (where object and render are closely aligned)
This method is ideal when there is a clear photometric distinction between object and environment.
You may define your own approach to object segmentation (by, e.g., using a neural network to compare feature maps or
using more depth information) by inheriting from vpObjectMask.
Note that the use of the probability map output by the segmentation method is implemented differently for each type of
feature. For instance, contour based method will search for strong probability differences along the contour normal,
while point based feature may only look in a small neighborhood to see if a given area belong to the object.
The diagram below sums up the combination of the masking and outlier rejection steps.
\image html rbt-filtering.jpg Masking is first used to remove features in cases such as strong occlusions. Remaining outliers are filtered through the use of robust estimators. width=50%
\subsection rbt_tracking_optimization 3.4. Minimization problem
To update the camera pose, the RBT implements tracking as a non-linear optimization problem.
The error to be minimized is the difference between features extracted from the 3D model that have been matched with
features in the current images. For each feature, we must compute the visual error as well as its relationship to the
pose (the Jacobian).
The tracking flow is as follows:
- For each tracked feature type
- Extract features from the renders. Potentially use segmentation mask to reject unreliable data
- Match/track features in the image acquired by the camera.
Then, optimization is an iterative process:
- While optimization has not converged or maximum number of iterations has not been reached:
- For each feature
- Update feature representation (extracted from the model) with the new pose \f$^{c}\mathbf{M}_{o}\f$
- Compute feature error
- Compute feature Jacobian
- Compute pose displacement, by performing a weighted combination of the feature errors and Jacobians
- Update \f$^{c}\mathbf{M}_{o}\f$
- Check for convergence.
To influence the importance of each feature type, the user can tweak its weight by calling
vpRBFeatureTracker::setTrackerWeight().
There are several options to tweak the behavior of the optimization process, which is based on a Levenberg-Marquardt
approach. See \ref rbt_tracking_config for more information.
\subsection rbt_drift_detection 3.5. Detecting tracking failure
Tracking may of course fail. It is thus important to have some measure of confidence in the tracking process.
If this option is enabled, then the RBT will compute a confidence metric for the computed object pose.
It is then your responsibility to use this score to call for a tracker reinitialization.
To enable drift detection, you should call vpRBTracker::setDriftDetector() with the scoring method that you wish
to use.
The method that is provided by default, vpRBProbabilistic3DDriftDetector, is an online method that compares the
photometric information of the model in the current image with a representation that is learned online.
If available, it will also compare the depth in the render data with the depth from the camera.
This representation is a set of 3D points, for which some color statistics are stored and updated every frame.
This method will output a probability estimate that the model is correctly tracked. The confidence metric is thus between 0 and 1.
This method relies on framing color and depth information as Gaussian distributions. Their spread is controlled by the
user and a larger spread will result in a greater confidence and smoother changes in the confidence score.
In addition, since color estimation is performed online, the variance of the color distribution of a point is used to
factor to weigh its importance with respect to other sample when computing the final score.
This means that more importance will be given to points that have a stable color (that are in a smooth color region of
the model and that were not too impacted by lighting changes in the previous frames).
\section rbt_tracking_config 4. Configuration
To ease the use of the RBT, it is strongly recommended to have compiled ViSP with the 3rd-party that enables
JSON support (see \ref soft_tool_json installation instructions). This section describes how to configure the tracker.
\subsection rbt_tracking_config_in_code 4.1. In-code configuration
If you wish or must configure the tracker directly from the code, these are the main methods of interest:
- vpRBTracker::setModelPath() to set the 3D model to use.
- vpRBTracker::addTracker() to add a new feature to track.
- vpRBTracker::getFeatureTrackers() to retrieve the currently tracked features. You can configure each feature separately.
- vpRBTracker::setCameraParameters() to set the camera intrinsics and image resolution.
- vpRBTracker::setOdometryMethod() to set the method used to perform visual odometry as a preprocessing step.
- vpRBTracker::setDriftDetector() to set the algorithm used to estimate the tracking confidence.
- vpRBTracker::setObjectSegmentationMethod() to set the pixel wise segmentation computation method.
\subsection rbt_tracking_config_json 4.2. JSON configuration
To load a json configuration file, you can call vpRBTracker::loadConfigurationFile().
This subsection details the parameters that you can tweak through JSON.
\subsubsection rbt_tracking_config_json_general 4.2.1. General parameters
| Field | Type | Description
| :---- | :---- | :----
| model | string | Optional path to the 3D model of the object. If not present, it should be set using vpRBTracker::setModelPath() before calling vpRBTracker::startTracking()
| displaySilhouette | Boolean | Whether to display the object silhouette from the last rendered frame when calling vpRBTracker::display(). Note that since this is the last rendered frame, it may appear laggy.
| updateRenderThreshold | float | Optional. The motion threshold between the last render and the current object pose above which the object should be rerendered and the renders updated. By default, this metric is specified in meters. This threshold is also used when rerendering after odometry. Set this value to 0 to always perform rendering.
| camera | Dictionary | Optional camera intrinsics. See \ref rbt_tracking_config_camera.
| vvs | Dictionary | Parameters for optimization. See \ref rbt_tracking_config_optimization.
| silhouetteExtractionSettings | Dictionary | Parameters for the contour extraction from the renders. See \ref rbt_tracking_config_silhouette.
| mask | Dictionary | Optional segmentation method configuration. See \ref rbt_tracking_config_mask.
| drift | Dictionary | Optional drift detection method configuration. See \ref rbt_tracking_config_drift.
| features | List | List of features to track. See \ref rbt_tracking_config_features.
\subsubsection rbt_tracking_config_camera 4.2.2. Camera configuration
| Field | Type | Description
| :---- | :---- | :----
| width | int | Image width
| height | int | Image Height
| intrinsics | Dictionary | Camera intrinsics, as detailed below
Intrinsics definition:
| Field | Type | Description
| :---- | :---- | :----
| px | double | Focal length to pixel width ratio
| py | double | Focal length to pixel height ratio
| u0 | double | Principal point pixel vertical location
| v0 | double | Principal point pixel horizontal location
| model | string | Should be set to "perspectiveWithoutDistortion"
\subsubsection rbt_tracking_config_optimization 4.2.3. Optimization parameters
Example configuration without defining a stopping criterion for optimization:
\code{.json}
"vvs": {
"gain": 1.0,
"maxIterations": 10,
"mu": 0.0,
"muIterFactor": 0.0
}
\endcode
And with a criterion (stop if 3D motion is below 1/10th of a millimeter between two iterations)
\code{.json}
"vvs": {
"gain": 1.0,
"maxIterations": 10,
"mu": 0.0,
"muIterFactor": 0.0,
"convergenceThreshold": 0.0001
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| gain | double | Optimization gain. A gain too small may lead to drift issues, while a gain too high may lead to instabilities in pose.
| maxIterations | int | Maximum number of optimization iterations.
| mu | double | Initial amount of regularization for the Levenberg-Marquardt optimizer.
| muIterFactor | double | Scaling factor to the mu parameter applied at every optimization iteration
| convergenceThreshold | double | Motion threshold (in meters) below which the tracking is considered as having converged. This conditions the number of iterations for optimization. Set to 0 or do not define to run optimization for the maximum number of iterations.
\subsubsection rbt_tracking_config_silhouette 4.2.4. Silhouette extraction settings
Example configuration:
\code{.json}
"silhouetteExtractionSettings": {
"threshold": {
"type": "relative",
"value": 0.1
},
"sampling": {
"samplingRate": 1,
"numPoints": 512,
"reusePreviousPoints": true
}
}
\endcode
To control the silhouette processing, there are two main settings **threshold** and **sampling**.
The threshold settings serve to determine what is a silhouette point or not. Silhouette extraction is based on depth
disparity between neighboring render pixels.
If there is a strong depth disparity, then a point is considered as belonging to the silhouette.
The threshold can be either defined as an absolute value, or relatively to the object's "depth" size (the difference
between the farthest and closest object points in the camera).
| Field | Type | Description
| :---- | :---- | :----
| type | string | Either "relative" or "absolute". If relative, specified as a fraction of the difference between the clipping planes. If absolute, specified in meters.
| value | double | Value of the threshold.
The thresholding parameters are used to determine whether a point belongs to the silhouette, but processing the whole
silhouette may too cumbersome as there can thousands of contour pixels.
Thus, we can sample the silhouette to obtain a subset of points. Sampling as a strong impact on the runtime performance
of the tracker.
| Field | Type | Description
| :---- | :---- | :----
| samplingRate | int | This is the step (in pixels) that is used to sample the silhouette map. A value of 1 will examine every pixel. Higher values are susceptible to aliasing artifacts, but may provide more even sampling across the silhouette.
| numPoints | int | Maximum number of points to keep. If set to 0, then all points are kept.
| reusePreviousPoints | boolean | Whether to try and reuse the same silhouette points across tracking iterations. This may help improve stability and reduce jitter.
\subsubsection rbt_tracking_config_mask 4.2.5. Object segmentation method configuration
The use of a segmentation method is defined by the **mask** key in the main configuration dictionary.
Which type of method is used depends on the value of the "type" key of the mask dictionary.
\paragraph rbt_tracking_config_mask_proba 4.2.5.1. vpColorHistogramMask configuration
Example configuration:
\code{.json}
{
"type": "histogram",
"bins": 32,
"objectUpdateRate": 0.1,
"backgroundUpdateRate": 0.1,
"maxDepthError": 0.01,
"computeOnlyOnBoundingBox": false
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| type | string | Set to "histogram" to use this method.
| bins | int | Should be a power of 2 (e.g, 16 or 32). Determines the number of bins used per color component for the object and background histograms. Less bins will lead to less noise in the pixel map, but may lead to a less confident map.
| objectUpdateRate | float | Object histogram update rate. Should be between 0 and 1. A value of 1 means that only the last frame will be used in the histogram. Values below will keep a memory of the last seen frames.
| backgroundUpdateRate | float | Background histogram update rate. Same as objectUpdateRate.
| maxDepthError | float | When depth information is available, Render depth and actual depth will be compared. For a given pixel of the object, if the depth difference is too high, the pixel will be used to compute the background histogram instead of the object histogram. This helps alleviate the influence of occlusion on the mask. Tihs parameter controls the max tolerated depth difference, expressed in meters.
| computeOnlyOnBoundingBox | boolean | Whether to compute the mask only in the area containing the object. If true, this will save computation time, but may lead to preserving silhouette contours that should not be.
\subsubsection rbt_tracking_config_drift 4.2.6. Drift detection method configuration
\paragraph rbt_tracking_config_drift_proba 4.2.6.1. vpRBProbabilistic3DDriftDetector configuration
Example configuration:
\code{.json}
{
"type": "probabilistic",
"colorUpdateRate": 0.25,
"initialColorSigma": 25.0,
"depthSigma": 0.025,
"filteringMaxDistance": 0.001,
"minDistanceNewPoints": 0.005
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| type | string | Set to "probabilistic" to use this method.
| colorUpdateRate | float | Update rate for the color statistics of each point on the object's surface.
| initialColorSigma | float | Standard deviation (R,G,B) used to compute the error function between the observed color and the estimated color distribution.
| depthSigma | float | Standard deviation used to compute the error function between the observed pixel depth and rendered depth.
| filteringMaxDistance | float | Minimum error between the rendered depth and the stored point's distance to the camera. This is used to check whether a 3D point is visible from the camera. This value is in meters, and should be left to a small value.
| minDistanceNewPoints | float | Each frame, this method tries to sample new 3D points on the object. This parameter controls the sampling density, and ensures that a new candidate point is far enough from all the other stored points from previous iterations.
\subsubsection rbt_tracking_config_features 4.2.7. Feature configuration
The type of feature that is used depends on the "type" value of each feature dictionary.
The "features" object of the main dictionary is a list, as multiple features can be used.
For each feature, these values can be defined:
| Field | Type | Description
| :---- | :---- | :----
| weight | float or vpTemporalWeighting | The importance given by the user to this feature type.
| display | boolean | Whether the features should be displayed when calling vpRBTracker::display().
For weighting, it can be of interest for certain features to give them more importance at the end of optimization, when the pose should be close to the correct solution.
Indeed, certain features such as Silhouette contours or depth features exhibit a limited convergence domain (depending on their settings) but do not suffer from modelling inaccuracies, like keypoints.
As such, they are ideal to correct the small remaining error that cannot be removed by keypoints.
To do so, you can use a sigmoid-like weighting scheme for such features. This scheme modifies the weight during the optimization (called every frame).
The configuration is the following:
| Field | Type | Description
| :---- | :---- | :----
| minWeight | float | The minimum weight given to the feature (either at the start or end of optimization, depending on the increasing flag)
| maxWeight | float | The maximum weight given to the feature (either at the start or end of optimization, depending on the increasing flag)
| midpointLocation | float | The optimization point where the weight reaches the halfpoint between minWeight and maxWeight. It is specified as a fraction of the number of iterations (between 0 and 1)
| slopePower | int | How fast the weighting transitions from minWeight to maxWeight
| increasing | boolean | Whether the weight should increase (go from minWeight to maxWeight) or decrease (maxWeight to minWeight)
\image html rbt-temporal-weighting.png Different values for temporal weighting.
Example configurations:
For a features whose importance does not change during optimization
\code{.json}
{
"weight": 1.0,
"display": false
}
\endcode
For a feature that is more important at the end of optimization (A feature that has a more local convergence domain)
\code{.json}
{
"weight": {
"minWeight": 0.0,
"maxWeight": 1.0,
"midpointLocation": 0.5,
"slopePower": 4,
"increasing": true
},
"display": false
}
\endcode
And accordingly, a feature whose importance decreases (such as features that have a large convergence domain but that may be inaccurate):
\code{.json}
{
"weight": {
"minWeight": 0.0,
"maxWeight": 1.0,
"midpointLocation": 0.5,
"slopePower": 4,
"increasing": false
},
"display": false
}
\endcode
\paragraph rbt_tracking_config_features_klt 4.2.7.1. vpRBKltTracker
\warning This feature requires that ViSP has been built with OpenCV as a 3rd party (see \ref soft_vision_opencv
installation instructions). Otherwise, loading this JSON configuration will raise an error about encountering an
unknown tracker.
Example configuration:
\code{.json}
{
"type": "klt",
"weight": 1,
"useMask": true,
"minMaskConfidence": 0.5,
"maxReprojectionErrorPixels": 5.0,
"newPointsMinPixelDistance": 4,
"minimumNumPoints": 20,
"blockSize": 5,
"useHarris": true,
"harris": 0.05,
"maxFeatures": 500,
"minDistance": 5.0,
"pyramidLevels": 3,
"quality": 0.01,
"windowSize": 5
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features.
| minMaskConfidence | float | If useMask is true, minimum confidence for a feature to be accepted.
| maxReprojectionErrorPixels | float | Maximum reprojection error, above which a tracked KLT point is rejected for future tracking iterations.
| newPointsMinPixelDistance | float | During KLT resampling, minimum euclidean distance that a point should have with all other to be considered for tracking.
| minimumNumPoints | int | Minimum number of points that should be tracked at a given timestep. If the number of points is below, then KLT tracking is fully reinitialized.
\note For the other settings, see the vpKltOpenCV documentation.
\paragraph rbt_tracking_config_features_me 4.2.7.2. vpRBSilhouetteMeTracker
Example configuration:
\code{.json}
{
"type": "silhouetteMe",
"weight": 1,
"numCandidates": 3,
"useMask": false,
"minMaskConfidence": 0.8,
"movingEdge": {
"maskSize": 7,
"minSampleStep": 1.0,
"mu": [
0.5,
0.5
],
"nMask": 180,
"range": 16,
"sampleStep": 1.0,
"strip": 2,
"thresholdType": "normalized",
"threshold": 20
}
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| numCandidates | int | Number of edge locations that are preserved as potential matches for a given silhouette point.
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background).
| minMaskConfidence | float | If useMask is true, minimum confidence disparity for a feature to be accepted.
| movingEdge | dictionary | Moving edge tracker settings. See vpMe for more information.
\paragraph rbt_tracking_config_features_ccd 4.2.7.3. vpRBSilhouetteCCDTracker
Example configuration for a case with small motions:
\code{.json}
{
"type": "silhouetteColor",
"weight": 0.1,
"useMask": true,
"minMaskConfidence": 0.8,
"ccd": {
"h": 16,
"delta_h": 1,
"min_h": 16
}
}
\endcode
Example configuration to handle larger motions:
\code{.json}
{
"type": "silhouetteColor",
"weight": 0.1,
"useMask": true,
"minMaskConfidence": 0.8,
"ccd": {
"h": 64,
"delta_h": 4,
"min_h": 16
}
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background).
| minMaskConfidence | float | If useMask is true, minimum confidence disparity for a feature to be accepted.
| maxNumPoints | int | Resample number of points to use for this tracker. This allows to use many points for the ME-based tracker, while retaining fewer points for this tracker (it is more expensive). Leave to 0 to not perform resampling.
| temporalSmoothing | float | Interpolation factor, between 0 and 1. This allows to blend a contour point's current color statistics with those computed on the previous frame. This may improve stability, but may also lead to smaller motion.
| ccd | dictionary | Contracting curve density algorithm settings. See below.
To configure the actual CCD parameters, you can modify:
| Field | Type | Description
| :---- | :---- | :----
| h | int | The initial contour search area. This size (in pixels) is used to define the normal size that is used to compute the color statistics and will strongly impact the convergence domain of the tracker.
| delta_h | int | Subsampling step when examining the contour normal. This helps diminish the tracking time when `h` is large. Using a high value (comparatively to that of `h`) can lead to instabilities.
| min_h | int | Parameter to set if you wish to use an adaptive scaling for the silhouette search area. If set, the initial search area will be set to `h`. Then, at each optimization iteration, if motion is smaller than a multiple of `h`, then `h` is halved, down to a minimum value of `min_h`. Note that if `delta_h` is set, then it is also halved in order to obtain more reliable statistics.
\paragraph rbt_tracking_config_features_depth 4.2.7.4. vpRBDenseDepthTracker
Example configuration:
\code{.json}
{
"type": "depth",
"weight": 1,
"useMask": true,
"minMaskConfidence": 0.5,
"step": 2,
"display": false
},
\endcode
| Field | Type | Description
| :---- | :---- | :----
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features.
| minMaskConfidence | float | If useMask is true, minimum confidence for a feature to be accepted.
| step | int | Step in pixels to sample the render map. A low value for this parameter will lead to a larger number of depth features. As this step is used when iterating on rows and columns of the render images, the number of depth features will quickly decrease as the step grows.
\subsection rbt_python_options 4.3 Python specific configuration
\warning The features defined below are only available when using the Python bindings. To install and use the bindings see \ref tutorial-install-python-bindings.
When using the Python bindings, more options are available. These are defined in `modules/python/bindings/visp/python/rbt`
You can load Python features from a JSON configuration.
To do so, you can place the Python configuration in a JSON object under the "python_ext" key.
Your config file should look like:
\code{.json}
{
"features": [ // CPP features
...
],
...,
"python_ext": {
"features": [
...
],
"odometry": {
...
}
}
}
\endcode
For more information see \ref modules/python/bindings/visp/python/rbt/PythonRBExtensions.py
An example of RBT using Python is available under modules/python/examples/realsense-rbt.py
\subsubsection rbt_python_features 4.3.1. XFeat-based options
We provide a wrapper around the XFeat keypoints. XFeat keypoints are robust, and allow for reliable matching,
even in the presence of large motion blur and image difference. They are far more powerful than the KLT based features.
XFeat keypoints are matched with 3D points that are stored in a map (see vpPointMap). This map either contains objects points (expressed in the object's frame)
or in the environment frame. When a new image arrives, keypoints are extracted, then matched with the points in the map.
The correspondences can then be used to minimize a visual error to estimate the object or camera motion. The visual error can be 2D (classical IBVS) or in 3D (Point minimisation in metric space)
After minimization incorrect matches are detected and removed from the 3D point map. New points are added.
\warning As XFeat is based on deep learning, having a GPU is preferable. Running XFeat on the CPU is still possible
We run XFeat on the whole image, and the keypoints can be used both for the object tracking and camera motion estimation via visual odometry.
To use XFeat-backed settings, the Python settings should contain the XFeat feature extraction settings.
\code{.json}
"xfeat": {
"numPoints": 8192,
"minCos": 0.95,
"useDense": false,
"onlyOnBB": false,
"minObjMaskValue": 0.5,
"minSilhouetteDistanceEnvPoint": 20.0,
"minSilhouetteDistanceObjectPoint": 1.0
},
\endcode
| Field | Type | Description
| :---- | :---- | :----
| numPoints | int | The maximum number of points that are extracted in the frame
| minCos | float | The minimum matching score for two keypoints to be matched (extract from two different images). Use a high value (e.g. 0.9) to minimize the number of matches.
| useDense | boolean | Whether to use dense keypoint extraction and matching (Xfeat*)
| onlyOnBB | boolean | Wheter to run keypoint extraction only in the object region in the image. Do not use this option if you wish to use odometry.
| minObjMaskValue | float | Minimum mask value (if mask is enabled) for a keypoint to be considered as belonging to the object. If the mask value is below, it is considered an environment keypoints (that may be used for odometry).
| minSilhouetteDistanceEnvPoint | float | Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint , that can be used for odometry. Useful to reject keypoints that could potentially belong to the object in scenarios where large motions happen
| minSilhouetteDistanceObjectPoint | float | Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint, that can be used to estimate object motion. Useful to only consider keypoints that are in the texture part of the object and not its contour, where keypoints may rely on the background and thus provide unreliable matches.
To enable XFeat feature for object tracking, you should add the following to the "python_ext" object configuration:
\code{.json}
"features": [
{
"type": "xfeat",
"weight": 1.0,
"use_3d": false,
"display": true,
"numPoints": 8192,
"reprojectionThreshold": 5,
"minDistNewPoint": 0.0,
"maxDepthErrorVisible": 0.01,
"maxDepthErrorCandidate": 0.005
}
]
\endcode
| Field | Type | Description
| :---- | :---- | :----
| use_3d | boolean | Whether to use 2D error criterion or 3D (requires RGBD camera)
| numPoints | int | The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints.
| reprojectionThreshold | float | The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map.
| minDistNewPoint | float | Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition.
| maxDepthErrorVisible | float | Maximum depth error (in meters) between the rendered depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Mainly used to filter self occlusion.
| maxDepthErrorCandidate | float | Maximum depth error (in meters) before a new keypoint addition is rejected. Used to filter points that do not belong to the object or for which the actual 3D error is too great.
To perform odometry using XFeat, you can add these settings:
\code{.json}
"odometry": {
"type": "xfeat",
"use3d": false,
"numPoints": 8192,
"reprojectionThreshold": 10.0,
"minDistNewPoint": 0.0,
"maxDepthErrorVisible": 0.02,
"maxDepthErrorCandidate": 0.0,
"gain": 0.5,
"maxNumIters": 20,
"muInit": 0.0,
"muIterFactor": 0.1,
"minImprovementFactor": 0.001
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| use_3d | boolean | Whether to use 2D error criterion or 3D
| numPoints | int | The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints.
| reprojectionThreshold | float | The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map.
| minDistNewPoint | float | Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition.
| maxDepthErrorVisible | float | Maximum depth error (in meters) between the actual camera depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Used to filter occlusions.
| maxDepthErrorCandidate | float | Maximum depth error (in meters) before a new keypoint addition is rejected.
\warning XFeat odometry requires an RGBD camera.
\section rbt_tracking_usage 5. Tracker usage
\subsection rbt_tracking_usage_sequence 5.1. On prerecorded sequences
In order to test the tracker, we provide a prerecorded sequence of images named `dragon.mp4` and the dragon object,
whose CAD model file `dragon.bam` is provided in the ViSP source code.
The result of tracking the dragon can be obtained by going to the build folder:
\code{.sh}
~/visp-build $ cd tutorial/tracking/render-based
\endcode
then launching:
\code{.sh}
$ ./tutorial-rbt-sequence --color data/dragon/dragon.mp4 --tracker data/dragon/dragon.json
\endcode
Note that:
- The full code to track an offline, preregistered sequence is in tutorial-rbt-sequence.cpp
- Parsing and display utils are given in render-based-tutorial-utils.h
More explanation is provided in the next section.
\subsection rbt_tracking_usage_realsense 5.2. With a realsense camera
We will now focus on an other example usage of the tracker, using a Realsense camera.
Note that:
- The full code for this realsense-based tutorial is available in tutorial-rbt-realsense.cpp
- Parsing and display utils are given in render-based-tutorial-utils.h
To run the program, start by placing yourself in the ViSP build directory.
Assuming you have compiled and built the tutorials, navigate to the tutorial folder:
\verbatim
~/visp-build $ cd tutorial/tracking/render-based
\endverbatim
You can then examine the command line arguments with:
\code{.sh}
~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense -h
\endcode
Which outputs:
\verbatim
Program description: Tutorial showing the usage of the Render-Based tracker with a RealSense camera
Arguments:
--config Path to the JSON configuration file. Values in this files are loaded, and can be overridden by command line arguments.
Optional
--debug-display Enable additional displays from the renderer
Default: false
--fps Realsense requested framerate
Default: 60
Optional
--height Realsense requested image height
Default: 480
Optional
--init-file Path to the JSON file containing the 2D/3D correspondences for initialization by click
Default: ""
Optional
--max-depth-display Maximum depth value, used to scale the depth display
Default: 1.0
Optional
--no-display Disable display windows
Default: true
--object Name of the object to track. Used to potentially fetch the init file
Default: ""
Optional
--plot-cov Plot the pose covariance trace for each feature
Default: false
--plot-divergence Plot the metrics associated to the divergence threshold computation
Default: false
--plot-pose Plot the pose of the object in the camera frame
Default: false
--plot-position Plot the position of the object in a 3d figure
Default: false
--pose Initial pose of the object in the camera frame.
Default: []
Optional
--profile Enable the use of Pstats to profile rendering times
Default: false
--save Whether to save experiment data
Default: false
--save-path Where to save the experiment log. The folder should not exist.
Default: ""
Optional
--save-video Whether to save the video
Default: false
--tracker Path to the JSON file containing the tracker
Default: ""
Optional
--video-framerate Output video framerate
Default: 30
Optional
--width Realsense requested image width
Default: 848
Optional
Example JSON configuration file:
{
"--debug-display": false,
"--fps": 60,
"--height": 480,
"--init-file": "",
"--max-depth-display": 1.0,
"--no-display": true,
"--object": "",
"--plot-cov": false,
"--plot-divergence": false,
"--plot-pose": false,
"--plot-position": false,
"--pose": [],
"--profile": false,
"--save": false,
"--save-path": "",
"--save-video": false,
"--tracker": "",
"--video-framerate": 30,
"--width": 848
}
\endverbatim
You can see that among those arguments, the script has two main parameters.
The first, `--tracker` is the path to the .json configuration file, defined following the details given in
\ref rbt_tracking_config section.
The second `--object`, is the path to the 3D model (in a format readable by the Panda3D engine).
Alternatively, it can be specified in the .json file.
Here, we will run the tutorial with the following arguments
\verbatim
~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense \
--tracker data/dragon/dragon_realsense.json \
--object data/dragon/dragon.bam \
--init-file data/dragon/dragon.init
\endverbatim
The file `dragon_realsense.json` contains the following configuration:
\include data/dragon/dragon_realsense.json
\warning Note that to perform your own tests, you will have to rebuild the tutorial-rbt-realsense every time you make
a change to the .json file located in the source directory so that the changes are reflected in the build directory
Let's now have a look at the tutorial code that will perform the tracking. We recall that the full code is given
in tutorial-rbt-realsense.cpp.
The code first starts by parsing the command line arguments, detailed above, using structures defined in
render-based-tutorial-utils.h:
\snippet tutorial-rbt-realsense.cpp Command line parsing
We start by declaring an instance of the tracker and load the given JSON configuration file.
\snippet tutorial-rbt-realsense.cpp Loading config
Depending on the configuration, the call to vpRBTracker::loadConfigurationFile() may raise an error.
For instance, The KLT tracker requires OpenCV and if you did not compile ViSP with OpenCV as a third party,
an error will be raised.
If successful, we then open a connection to the realsense camera, requesting a specific configuration.
This configuration depends on the input arguments (depth, width and framerate)
\snippet tutorial-rbt-realsense.cpp Realsense opening
If the configuration is available, we can proceed and update the tracker with a new, more relevant configuration.
Indeed, while camera intrinsics and image resolution can be declared in the configuration file,
using the intrinsics provided by the RealSense will be more accurate and ensures that there is no configuration mismatch.
We then call vpRBTracker::startTracking() to initialize the renderer, load the 3D model and other setup related operations.
\snippet tutorial-rbt-realsense.cpp Tracker update
\warning It is at this point that some errors related to an incorrect intrinsics configuration may be raised.
Additionally, if you do not have an OpenGL-capable device or if Panda3D cannot open a window, an error may be raised.
If so, please see \ref rbt_troubleshooting.
We then proceed and declare the different images that will be needed by the tutorial.
Some images contain raw information and will be fed to the tracker.
Others will be used for display purposes.
\snippet tutorial-rbt-realsense.cpp Images
We then create the different display windows with
\snippet tutorial-rbt-realsense.cpp Create displays
Once the setup is finished, we wait on the user to setup is environment and position the object (here, the dragon) in
front of the camera.
\snippet tutorial-rbt-realsense.cpp Wait before init
When the user has clicked the display window, initialization can be performed.
\snippet tutorial-rbt-realsense.cpp Init
If the initial pose is given as a command line argument, tracking can directly proceed.
Otherwise, The user must click on different pixel positions that correspond to known 3D points, defined in a
initialization file located next to the configuration file or the mesh.
In this case, the initialization file is located at `data/dragon/dragon.init` and contains:
\include data/dragon/dragon.init
Once initialization has been performed and a starting pose given, we can enter the tracking loop.
At every iteration, the same simple steps are performed:
We start by acquiring the most recent color and depth frames.
\snippet tutorial-rbt-realsense.cpp Image acquisition
The alignment parameter ensures that the depth map is aligned with the color frame, as required by the RBT.
The call to updateDepth() converts from a raw uint16_t representation to a depth map in meters.
We also convert The RGB image to a grayscale image using vpImageConvert::convert().
We can then call the tracker with the different frames:
\snippet tutorial-rbt-realsense.cpp Call to tracker
As an output, we get a vpRBTrackingResult object.
This object contains information about what the tracker has done, the number of features that were used, etc.
It contains one important information, which is the reason for why the tracking call has returned.
In the following snippet:
\snippet tutorial-rbt-realsense.cpp Result parsing
We use this information to check whether:
- There was an exception
- The object is present in the image (if not, no information can be extracted from the render): reinitialization
should be performed
- There are enough features to compute pose updates.
- If the tracking has converged to a given pose (i.e., motion during optimization has become small enough to be
considered as noise).
- Or if the tracking has used the maximum allocated number of optimization iterations.
In the first three cases, this means that tracking should be reinitialized.
After that, we check (if it has been computed) the tracking confidence score. If it is low, we emit a warning.
In a more complete application, the pipeline would contain an automatic reinitialization procedure.
We then display the current frame and overlay the tracking information
\snippet tutorial-rbt-realsense.cpp Display
On the grayscale image, you can see:
- If enabled, the moving edge contours. Valid edges are seen in green, while suppressed ones are purple.
- If enabled, the KLT tracked points, represented as red dots on the object.
On the color image, you can see:
- The pose of the object in the camera frame, represented as a 3-axis basis
- If enabled, the contours used by the CCD tracker.
The display also contains the depth image:
- By default, depth display is disabled as it incurs a large performance penalty and may reduce framerate.
You can change the tracker configuration to enable by adding `"display": true` to the "depth" feature object.
Finally, you should also see a black and white image which is the pixel wise segmentation mask that is used to filter
out the features. If no segmentation method is used, the image should be black.
\image html rbt-full-display.jpg Example displays when tracking with the realsense camera and all features enabled. width=75%
We also log in the console timing details.
Running the tutorial with the **--verbose** flag will print timings for each part of the RBT pipeline.
\snippet tutorial-rbt-realsense.cpp Logging
When the user has pressed left click, we exit the tracking loop and perform a small cleanup:
\snippet tutorial-rbt-realsense.cpp Cleanup
\section rbt_troubleshooting 6. Troubleshooting issues
\subsection rbt_slow 6.1. RBT performance are bad and cores are not used
One possible reason for this is that we make large use of OpenMP to parallelize and split the workload.
An invalid OpenMP configuration may lead to a substantial loss in performance.
In Ubuntu 24.04, we have observed with the GNU OpenMP implementation (that is used if you compile with gcc/g++) that
the default settings lead to a slow down by a factor of 4/5.
This is due to an active waiting policy. We recommend using the following OpenMP settings:
\code{.sh}
$ export GOMP_SPINCOUNT=0
\endcode
Note that this configuration will only be kept for the lifetime of your terminal and you may have to restore it the
next time you start the tracker.
\subsection rbt_panda_window 6.2. Cannot open window (hidden or visible)
This error can appear when calling vpRBTracker::startTracking().
It may be useful to check the backend used by Panda3D, which should be displayed when you start the tutorial:
\verbatim
Known pipe types:
glxGraphicsPipe
\endverbatim
You can also additionally get more information bout your OpenGL support.
In ubuntu, you can use:
\verbatim
glxinfo | grep version
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
OpenGL core profile version string: 4.6.0 NVIDIA 550.144.03
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL version string: 4.6.0 NVIDIA 550.144.03
OpenGL shading language version string: 4.60 NVIDIA
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.144.03
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
GL_EXT_shader_group_vote, GL_EXT_shader_implicit_conversions,
\endverbatim
If you do not have something similar, it may be that your platform has no OpenGL support.
\warning It is possible that your hardware has OpenGL support, but does not support **headless rendering**, i.e.,
rendering without a screen plugged to the GPU.
In this case, you should either try to use the **p3headlessgl** backend or use a virtual screen emulator.
- To use **p3headlessgl**, you should first visit
[this Panda3D documentation link](https://docs.panda3d.org/1.10/python/programming/configuration/configuring-panda3d).
Then you should edit the **Config.prc** file to set "load-display" to "p3headlessgl". In Ubuntu, the Config.prc is
located in `/etc/Config.prc`. In Conda, it will be in `$CONDA_PREFIX/etc/Config.prc`.
- In Linux distributions, you can use `xvfb` to emulate a x11 server to render to.
\warning Note that the RBT uses shaders that are compiled for OpenGL core 3.3.
\section rbt_extension 7. Extending the RBT
\warning This section is under construction and will be improved in the future!
\subsection rbt_extension_features 7.1. Defining your own features
\subsection rbt_extension_factory_pattern 7.2. Registering your own component for JSON parsing
As seen in \ref rbt_tracking_config, the different components of the RBT pipeline can be parsed from a JSON
configuration file. This requires a ViSP version that is built with the nlohmann::json third party
(see \ref rbt_tracking_install_requirements).
This functionality is implemented using a factory pattern as defined in the vpDynamicFactory.
For your own class to be loadable, you should call vpDynamicFactory::registerType with a lambda function that defines
how a JSON object can be used to configure an object from your type
For instance, in the vpRBDriftDetectorFactory, an object of type vpRBProbabilistic3DDriftDetector is registered as:
\code{.cpp}
registerType("probabilistic", [](const nlohmann::json &j) {
std::shared_ptr<vpRBProbabilistic3DDriftDetector> p(new vpRBProbabilistic3DDriftDetector());
p->loadJsonConfiguration(j);
return p;
});
\endcode
This means that when parsing the JSON object's drift field, if the **"type"** field is equal to "probabilistic",
then this function will be called and the json object will used to build a vpRBProbabilistic3DDriftDetector.
This patterns hold for the different types of components:
- vpRBFeatureTrackerFactory to register subtypes of vpRBFeatureTracker
- vpRBDriftDetectorFactory to register vpRBDriftDetector
- vpObjectMaskFactory to register vpObjectMask
*/
|