File: tutorial-tracking-rbt.dox

package info (click to toggle)
visp 3.7.0-10
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 166,384 kB
  • sloc: cpp: 392,705; ansic: 224,448; xml: 23,444; python: 13,701; java: 4,792; sh: 207; objc: 145; makefile: 118
file content (1198 lines) | stat: -rw-r--r-- 59,916 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
/**

\page tutorial-tracking-rbt Tutorial: Tracking with the Render-Based Tracker
\tableofcontents

\section rbt_tracking_intro 1. Introduction

In ViSP 3.7, we are introducing a new Render-Based Tracker (RBT), that leverages rendering on the GPU to extract
geometric information in order to perform online and continuous 6D pose estimation of a known 3D object.

It is an extension of the Model-Based Tracker (MBT) already present in ViSP (see \ref tutorial-tracking-mb-generic).
This implementation is derived from the work of \cite Petit14a.

A major advantage over the Model-Based Tracker is that the RBT uses a common mesh representation for the object to
track. This representation does not require additional preprocessing for feature selection and is widely supported in
software such as Blender.
It can also be obtained through photogrammetry processes, minimizing the work required before using the tracker.
Note that we do not consider deformable objects, only non-articulated rigid objects.

Similar to the MBT, tracking is framed as an optimization problem, where the difference between the features
(geometric or photometric) extracted from a 3D model and those observed in the image must be minimized.

The RBT is built as a modular pipeline, see \ref rbt_tracking_overview. In practice, this means:

- It can leverage both RGB and depth information (see \ref rbt_tracking_requirements)
- It features a complete configuration for every part of the tracking pipeline (see \ref rbt_tracking_config)
- Most components can be disabled or adapted to different use cases and scenarios
- Components can be extended: You can add new features or filtering methods (see \ref rbt_extension).

\section rbt_tracking_requirements 2. Requirements

\subsection rbt_tracking_install_requirements 2.1. Building the RBT module

To successfully build the RBT module, you will need:

- To compile using a minimum standard version of C++11 (defined in the `USE_CXX_STANDARD` CMake variable)
- The Panda3D 3rd-party, which is used for 3D rendering. See \ref tutorial-panda3d-install for more information on how
 to install Panda3D on your system. Panda3D is also available through Conda if you are compiling in a virtual
 environment.

Additionally, Some optional dependencies are (strongly) recommended:
- `nlohmann::json` (see \ref install_ubuntu_3rdparty_other for installation instructions) to load configuration files
 and save your tracking results. Without it, the tracker setup will have to be done through code
- OpenCV, if you wish to use the `vpRBKltTracker`, that uses KLT feature tracking for pose estimation.

\subsection rbt_tracking_requirements_general 2.2. General requirements

To use the RBT you will need several things:

- An OpenGL or DirectX enabled device, capable of performing 3D rendering.
 You can use software acceleration, although having a GPU is preferable for performance reasons
- A camera with known intrinsic parameters. For calibration see \ref tutorial-calibration-intrinsic. \warning The camera
 intrinsics should follow a model without distortion.
- If you are using an RGB-D camera, the depth image should be aligned with the RGB image. Some SDKs provide this
 functionality. For instance, the wrapper around the realsense SDK accepts an "align" parameter in the
 vpRealSense2::acquire() function. When correctly set, the alignment will automatically be performed for you.
- A 3D model of the object, in a format that is supported by Panda3D (see \ref tutorial-panda3d-file-conversion).
 If you have installed *libassimp-dev* or are using the conda package, Panda3D supports common formats such as
 *.ply*, *.obj*. Otherwise, you will have to convert your mesh using the previously linked method.

\subsection rbt_tracking_requirements_3d_model 2.3. 3D model considerations

There are very little restrictions on the 3D model that can be used in the RBT.

First, your 3D model's size should be expressed in meters. Be aware that some CAD software export models in millimeters.

If you are using the initialization by click, you will have to be careful with the model orientation when exporting
(see below).

Note that while 3D meshes support textures, the presently available RBT features do not use the texture information.

To correctly process your model, here is a small overview of the steps to follow in Blender

\subsubsection rbt_tracking_requirements_model_preparation 2.3.1. Preparing your model in Blender

To make sure your model is correct, you should start by setting the scale and orientation of your 3D model.

You should first import your model in Blender. In the top left corner of the Blender window, click on
**File > Import > (Your model type)**, then pick your object in the popup window.

Your model should have been imported and should be visible in the 3D viewer. You should then click on it and press
**N** to bring up the Transform panel in the top right corner. Then you should press **Ctrl+A** to open the **Apply**
menu and click on the **All transforms** item.
When you do this, you will not see any change applied to the model itself,
but this will ensure that what you see in Blender is what is observed by the tracker and that both use the same orientation and scale.

\image html rbt-blender-apply-transforms.png Apply transforms to clear any rotation and scale difference stored in Blender.

If you are using initialization by click as described in \ref mb_generic_init_user section, you can select the 3D
points to click directly in Blender. To do so, first click on your object and press *tab* to go into Edit mode. Then,
press *N* to display the **Transform** Panel (top right of the viewer). Choose a 3D point to use for initialization,
then click on it. In the **Transform** panel, you should then see the XYZ coordinates of the point, which you can copy
into the init file. Be sure to select the "Global" frame in the transform panel.

\image html rbt-click-init-blender-panel.png The transform panel with a selected 3D point in Edit mode.

Once you have selected your points make sure to export your model in the same frame as the Blender frame.
You can export your 3D model, using the **File > Export > (Supported 3D format such as .obj)** and save your model in
the desired location. In the export panel, set "Forward Axis" to **-Z** and "Up Axis" to **Y**.

With all the prerequisites met, you can now understand (if you wish) how the tracker works and how to customize it for
your needs.

\section rbt_tracking_overview 3. Algorithm overview

This section of the tutorial details how the tracker works.

The RBT is used by building a vpRBTracker object, configuring it and then feeding it RGB or RGB-D frames.

As the RBT is a tracking algorithm, an initial pose must be provided. Given an initial pose, the RBT will continuously
track an object in a sequence of frames. This also assumes that the motion between consecutive frame is small enough so
that the tracker can retrieve the features and correctly update the pose of the object in the camera frame.
Otherwise, there is a risk of divergence, leading to a need for reinitialization.
This can be done via click (see \ref mb_generic_init_user section and the vpRBTracker::initClick() method).
Otherwise, you can use a pose estimation algorithm (e.g. Megapose described in \ref tutorial-tracking-megapose and
vpMegaPose::estimatePoses() method) followed by a call to vpRBTracker::setPose().


Roughly the algorithm works as follows:

1. Given an initial pose of the object in the camera frame \f$^{c}\mathbf{T}_{o}\f$, generate a render of the object at
 this pose.
2. Extract and process render data, and store it into a vpRBRenderData object.
3. From the renders, seek features to match with in the current RGB and depth images.
4. From the match between image and model features, formulate an error \f$\mathbf{e}\f$.
5. Iteratively minimize \f$\vert\mathbf{e}\vert_2\f$ by updating \f$^{c}\mathbf{T}_{o}\f$


\subsection rbt_tracking_rendering 3.1. Rendering

Object rendering is done through Panda3D. The object is rendered at the last computed pose.

All the information derived from the render is stored into a vpRBRenderData, in the current frame data
(vpRBFeatureTrackerInput). This object contains information such as:

- The 3D surface normals, expressed as 3D unit vectors
- The depth map, in meters (Rendered using vpPanda3DGeometryRenderer)
- The object silhouette (extracted using vpPanda3DDepthCannyFilter and vpRBTracker::extractSilhouettePoints)
- The image area containing the object (represented by a vpRect)
- The pose at which the object was rendered: When implementing new features or algorithms,
 it is preferable to use this pose when extracting data from the render, instead of those provided in the functions
 to redefine.

\image html rbt-rendering.png Rendering is used to extract geometric information width=50%

For more information on how rendering is performed you can examine the following sources:
- vpRBTracker::updateRender
- vpRBRenderData
- vpObjectCentricRenderer
- vpPanda3DRendererSet
- vpPanda3DBaseRenderer

\subsection rbt_tracking_features 3.2. Trackable features

This section details the different features that can be used to track the object.
The RBT is flexible, and each feature can be added or removed, depending on your scenario and available sensor.

For each feature, a weighting function is computed at every iteration, and a user specified weight can be used to
consider certain features as more or less important than others.

All features inherit from a base class, vpRBFeatureTracker, that defines the different functionalities that should be
implemented for a feature to be trackable.

To add or interact with the features of the RBT, you can use vpRBTracker::addTracker() and
vpRBTracker::getFeatureTrackers().

\subsubsection rbt_tracking_available_features 3.2.1. Feature list

| Feature | Class | Required inputs | Note
| :----- | :----- | :----: | ----:
| KLT 2D points | vpRBKltTracker | Grayscale image | Requires OpenCV
| Silhouette moving edges | vpRBSilhouetteMeTracker | Grayscale image | -
| Silhouette color edges | vpRBSilhouetteCCDTracker | Color image | More expensive than silhouette moving edge based tracker
| Depth map (point-to-plane) | vpRBDenseDepthTracker | Depth image | -
| XFeat keypoints  | RBXFeatFeatureTracker | Color image | Only available in Python


- **KLT points** Rely on the vpKltOpencv class to extract and track 2D KLT points from the current image, which are
 then associated with points on the 3D model. The error to minimize is then the reprojection error between the 2D
 points in the current image and the reprojection of their associated 3D points.

- **Silhouette moving edges** Rely on the rendering to first extract the object depth disparities, which are
 hypothesized to create disparities in the luminance image as well. For each point of the depth contour, the most
 likely edge in its neighborhood in the luminance map is detected using the moving edge framework.

- **Silhouette Color edges** Similarly to the silhouette moving edges, the depth disparity is first used to extract
 the outer silhouette contours. Then, based on the Contracting Curve Density algorithm, color statistics are computed
 inside and outside the silhouette and a per pixel error is computed. This error is minimized when the difference
 between the inner and outer color distributions are maximized (i.e, there is a clear difference between the object's
 color and the environment). Unlike the moving edge approach, the error is not geometric in nature, but rather
 photometric.

- **Depth** The depth information can be used and compared to the rendered depth. With this feature, the error to
 minimize is the point-to-plane distance. The point is sampled from the current depth map, while the plane is computed
 from the rendered 3D model, using its distance to the camera and surface normal. The plane is continuously updated
 to minimize the distance to the sample depth point.

- **XFeat keypoints** Robust, deep learning based keypoints. The error criterion is the same as the KLT points,
but unlike KLT, XFeat relies on descriptors that are stored in a map in association to 3D points
These points can be matched even in cases where there are large motions and across time.

To define you own features (advanced), see \ref rbt_extension_features.

While you can use all the different features together here are some common rules of thumb

- KLT features tend to work better when there is some texture information on the model itself. However, they are also worth a try even on models with little texture.
- Avoid using the KLT tracker by itself. Indeed, the 3d points associated to tracked 2D points are estimated from the model at a given timestep.
 If this model is even slightly incorrect, this will result in drift over time. This drift can be compensated by the other features that do not have this 2D-3D correspondence step.
- Similarly, the depth features are ill-suited in two scenarios. First, when tracking an object where depth estimation is complex.
This can be the case when using an infrared based depth camera and tracking a black object that reflects very little light.
The second scenario is when tracking movements that are perpendicular to the camera plane with very low depth variation. An example of this is a cube that is fully facing the camera and that is translated left to right.
In this case, depth information should be combined with other features to track.
- Contour-based features are important for correct pose estimation. Color edges tend to recover from larger motions than the moving edge-based features, but may result in more jitter.
In most cases, it is not interesting to combine the two. One case that can be of interest is to use the color silhouette for large motion, and use more point for moving edges to recover the finer motion.
See the configuration section \ref rbt_tracking_config_silhouette and \ref rbt_tracking_config_features_ccd for information on how to accomplish this.
- Note that the color silhouette features only rely on the outer object silhouette, while the moving edges can also capture and leverage the depth disparities inside the object.
If you have strong depth changes on your object, it can be interesting to use the moving edges with a correct depth threshold.


\subsection rbt_tracking_filtering 3.3. Feature filtering

When tracking, it is possible for occlusions or lighting artifacts to appear. In those cases, considering ill-defined
or ill-matched features may lead to tracking failure. To handle those cases and improve tracking reliability, the RBT
comes with two mechanisms:

- A masking/segmentation approach, that outputs a per-pixel probability map that a given pixel belongs to the object
- Robust error criteria. Each feature defined above uses an M-Estimator (through vpRobust) to reweigh the errors to be
 minimized and reduce the influence of outliers.

While M-estimators are always used, the masking approach is entirely optional.
To enable masking, you should call vpRBTracker::setObjectSegmentationMethod() with the method that you wish to use.
To disable it, pass it *nullptr* as input argument.
As of now, the only method available is vpColorHistogramMask.
This method computes time-smoothed histograms of the object and background's color distributions,
and compares their output probabilities for a given pixel color to compute an object membership score.
If available, this method may use camera depth information to update the object histograms using only reliable pixels
in terms of depth (where object and render are closely aligned)
This method is ideal when there is a clear photometric distinction between object and environment.

You may define your own approach to object segmentation (by, e.g., using a neural network to compare feature maps or
using more depth information) by inheriting from vpObjectMask.

Note that the use of the probability map output by the segmentation method is implemented differently for each type of
feature. For instance, contour based method will search for strong probability differences along the contour normal,
while point based feature may only look in a small neighborhood to see if a given area belong to the object.

The diagram below sums up the combination of the masking and outlier rejection steps.

\image html rbt-filtering.jpg Masking is first used to remove features in cases such as strong occlusions. Remaining outliers are filtered through the use of robust estimators. width=50%

\subsection rbt_tracking_optimization 3.4. Minimization problem

To update the camera pose, the RBT implements tracking as a non-linear optimization problem.
The error to be minimized is the difference between features extracted from the 3D model that have been matched with
features in the current images. For each feature, we must compute the visual error as well as its relationship to the
pose (the Jacobian).

The tracking flow is as follows:
- For each tracked feature type
 - Extract features from the renders. Potentially use segmentation mask to reject unreliable data
 - Match/track features in the image acquired by the camera.

Then, optimization is an iterative process:
- While optimization has not converged or maximum number of iterations has not been reached:
 - For each feature
 - Update feature representation (extracted from the model) with the new pose \f$^{c}\mathbf{M}_{o}\f$
 - Compute feature error
 - Compute feature Jacobian
 - Compute pose displacement, by performing a weighted combination of the feature errors and Jacobians
 - Update \f$^{c}\mathbf{M}_{o}\f$
 - Check for convergence.

To influence the importance of each feature type, the user can tweak its weight by calling
vpRBFeatureTracker::setTrackerWeight().

There are several options to tweak the behavior of the optimization process, which is based on a Levenberg-Marquardt
approach. See \ref rbt_tracking_config for more information.

\subsection rbt_drift_detection 3.5. Detecting tracking failure

Tracking may of course fail. It is thus important to have some measure of confidence in the tracking process.

If this option is enabled, then the RBT will compute a confidence metric for the computed object pose.

It is then your responsibility to use this score to call for a tracker reinitialization.

To enable drift detection, you should call vpRBTracker::setDriftDetector() with the scoring method that you wish
to use.

The method that is provided by default, vpRBProbabilistic3DDriftDetector, is an online method that compares the
photometric information of the model in the current image with a representation that is learned online.
If available, it will also compare the depth in the render data with the depth from the camera.
This representation is a set of 3D points, for which some color statistics are stored and updated every frame.

This method will output a probability estimate that the model is correctly tracked. The confidence metric is thus between 0 and 1.

This method relies on framing color and depth information as Gaussian distributions. Their spread is controlled by the
user and a larger spread will result in a greater confidence and smoother changes in the confidence score.

In addition, since color estimation is performed online, the variance of the color distribution of a point is used to
factor to weigh its importance with respect to other sample when computing the final score.
This means that more importance will be given to points that have a stable color (that are in a smooth color region of
the model and that were not too impacted by lighting changes in the previous frames).

\section rbt_tracking_config 4. Configuration

To ease the use of the RBT, it is strongly recommended to have compiled ViSP with the 3rd-party that enables
JSON support (see \ref soft_tool_json installation instructions). This section describes how to configure the tracker.

\subsection rbt_tracking_config_in_code 4.1. In-code configuration

If you wish or must configure the tracker directly from the code, these are the main methods of interest:

- vpRBTracker::setModelPath() to set the 3D model to use.
- vpRBTracker::addTracker() to add a new feature to track.
- vpRBTracker::getFeatureTrackers() to retrieve the currently tracked features. You can configure each feature separately.
- vpRBTracker::setCameraParameters() to set the camera intrinsics and image resolution.
- vpRBTracker::setOdometryMethod() to set the method used to perform visual odometry as a preprocessing step.
- vpRBTracker::setDriftDetector() to set the algorithm used to estimate the tracking confidence.
- vpRBTracker::setObjectSegmentationMethod() to set the pixel wise segmentation computation method.

\subsection rbt_tracking_config_json 4.2. JSON configuration

To load a json configuration file, you can call vpRBTracker::loadConfigurationFile().

This subsection details the parameters that you can tweak through JSON.

\subsubsection rbt_tracking_config_json_general 4.2.1. General parameters

| Field | Type | Description
| :---- | :---- | :----
| model | string | Optional path to the 3D model of the object. If not present, it should be set using vpRBTracker::setModelPath() before calling vpRBTracker::startTracking()
| displaySilhouette | Boolean | Whether to display the object silhouette from the last rendered frame when calling vpRBTracker::display(). Note that since this is the last rendered frame, it may appear laggy.
| updateRenderThreshold | float | Optional. The motion threshold between the last render and the current object pose above which the object should be rerendered and the renders updated. By default, this metric is specified in meters. This threshold is also used when rerendering after odometry. Set this value to 0 to always perform rendering.
| camera | Dictionary | Optional camera intrinsics. See \ref rbt_tracking_config_camera.
| vvs | Dictionary | Parameters for optimization. See \ref rbt_tracking_config_optimization.
| silhouetteExtractionSettings | Dictionary | Parameters for the contour extraction from the renders. See \ref rbt_tracking_config_silhouette.
| mask | Dictionary | Optional segmentation method configuration. See \ref rbt_tracking_config_mask.
| drift | Dictionary | Optional drift detection method configuration. See \ref rbt_tracking_config_drift.
| features | List | List of features to track. See \ref rbt_tracking_config_features.


\subsubsection rbt_tracking_config_camera 4.2.2. Camera configuration


| Field | Type | Description
| :---- | :---- | :----
| width | int | Image width
| height | int | Image Height
| intrinsics | Dictionary | Camera intrinsics, as detailed below

Intrinsics definition:

| Field | Type | Description
| :---- | :---- | :----
| px | double | Focal length to pixel width ratio
| py | double | Focal length to pixel height ratio
| u0 | double | Principal point pixel vertical location
| v0 | double | Principal point pixel horizontal location
| model | string | Should be set to "perspectiveWithoutDistortion"

\subsubsection rbt_tracking_config_optimization 4.2.3. Optimization parameters

Example configuration without defining a stopping criterion for optimization:
\code{.json}
"vvs": {
 "gain": 1.0,
 "maxIterations": 10,
 "mu": 0.0,
 "muIterFactor": 0.0
}
\endcode

And with a criterion (stop if 3D motion is below 1/10th of a millimeter between two iterations)
\code{.json}
"vvs": {
 "gain": 1.0,
 "maxIterations": 10,
 "mu": 0.0,
 "muIterFactor": 0.0,
 "convergenceThreshold": 0.0001
}
\endcode

| Field | Type | Description
| :---- | :---- | :----
| gain | double | Optimization gain. A gain too small may lead to drift issues, while a gain too high may lead to instabilities in pose.
| maxIterations | int | Maximum number of optimization iterations.
| mu | double | Initial amount of regularization for the Levenberg-Marquardt optimizer.
| muIterFactor | double | Scaling factor to the mu parameter applied at every optimization iteration
| convergenceThreshold | double | Motion threshold (in meters) below which the tracking is considered as having converged. This conditions the number of iterations for optimization. Set to 0 or do not define to run optimization for the maximum number of iterations.

\subsubsection rbt_tracking_config_silhouette 4.2.4. Silhouette extraction settings

Example configuration:
\code{.json}
"silhouetteExtractionSettings": {
 "threshold": {
 "type": "relative",
 "value": 0.1
 },
 "sampling": {
 "samplingRate": 1,
 "numPoints": 512,
 "reusePreviousPoints": true
 }
}
\endcode

To control the silhouette processing, there are two main settings **threshold** and **sampling**.

The threshold settings serve to determine what is a silhouette point or not. Silhouette extraction is based on depth
disparity between neighboring render pixels.
If there is a strong depth disparity, then a point is considered as belonging to the silhouette.

The threshold can be either defined as an absolute value, or relatively to the object's "depth" size (the difference
between the farthest and closest object points in the camera).

| Field | Type | Description
| :---- | :---- | :----
| type | string | Either "relative" or "absolute". If relative, specified as a fraction of the difference between the clipping planes. If absolute, specified in meters.
| value | double | Value of the threshold.

The thresholding parameters are used to determine whether a point belongs to the silhouette, but processing the whole
silhouette may too cumbersome as there can thousands of contour pixels.
Thus, we can sample the silhouette to obtain a subset of points. Sampling as a strong impact on the runtime performance
of the tracker.

| Field | Type | Description
| :---- | :---- | :----
| samplingRate | int | This is the step (in pixels) that is used to sample the silhouette map. A value of 1 will examine every pixel. Higher values are susceptible to aliasing artifacts, but may provide more even sampling across the silhouette.
| numPoints | int | Maximum number of points to keep. If set to 0, then all points are kept.
| reusePreviousPoints | boolean | Whether to try and reuse the same silhouette points across tracking iterations. This may help improve stability and reduce jitter.


\subsubsection rbt_tracking_config_mask 4.2.5. Object segmentation method configuration

The use of a segmentation method is defined by the **mask** key in the main configuration dictionary.
Which type of method is used depends on the value of the "type" key of the mask dictionary.

\paragraph rbt_tracking_config_mask_proba 4.2.5.1. vpColorHistogramMask configuration

Example configuration:
\code{.json}
{
 "type": "histogram",
 "bins": 32,
 "objectUpdateRate": 0.1,
 "backgroundUpdateRate": 0.1,
 "maxDepthError": 0.01,
 "computeOnlyOnBoundingBox": false
}
\endcode

| Field | Type | Description
| :---- | :---- | :----
| type | string | Set to "histogram" to use this method.
| bins | int | Should be a power of 2 (e.g, 16 or 32). Determines the number of bins used per color component for the object and background histograms. Less bins will lead to less noise in the pixel map, but may lead to a less confident map.
| objectUpdateRate | float | Object histogram update rate. Should be between 0 and 1. A value of 1 means that only the last frame will be used in the histogram. Values below will keep a memory of the last seen frames.
| backgroundUpdateRate | float | Background histogram update rate. Same as objectUpdateRate.
| maxDepthError | float | When depth information is available, Render depth and actual depth will be compared. For a given pixel of the object, if the depth difference is too high, the pixel will be used to compute the background histogram instead of the object histogram. This helps alleviate the influence of occlusion on the mask. Tihs parameter controls the max tolerated depth difference, expressed in meters.
| computeOnlyOnBoundingBox | boolean | Whether to compute the mask only in the area containing the object. If true, this will save computation time, but may lead to preserving silhouette contours that should not be.

\subsubsection rbt_tracking_config_drift 4.2.6. Drift detection method configuration

\paragraph rbt_tracking_config_drift_proba 4.2.6.1. vpRBProbabilistic3DDriftDetector configuration
Example configuration:
\code{.json}
{
 "type": "probabilistic",
 "colorUpdateRate": 0.25,
 "initialColorSigma": 25.0,
 "depthSigma": 0.025,
 "filteringMaxDistance": 0.001,
 "minDistanceNewPoints": 0.005
}
\endcode

| Field | Type | Description
| :---- | :---- | :----
| type | string | Set to "probabilistic" to use this method.
| colorUpdateRate | float | Update rate for the color statistics of each point on the object's surface.
| initialColorSigma | float | Standard deviation (R,G,B) used to compute the error function between the observed color and the estimated color distribution.
| depthSigma | float | Standard deviation used to compute the error function between the observed pixel depth and rendered depth.
| filteringMaxDistance | float | Minimum error between the rendered depth and the stored point's distance to the camera. This is used to check whether a 3D point is visible from the camera. This value is in meters, and should be left to a small value.
| minDistanceNewPoints | float | Each frame, this method tries to sample new 3D points on the object. This parameter controls the sampling density, and ensures that a new candidate point is far enough from all the other stored points from previous iterations.

\subsubsection rbt_tracking_config_features 4.2.7. Feature configuration

The type of feature that is used depends on the "type" value of each feature dictionary.

The "features" object of the main dictionary is a list, as multiple features can be used.

For each feature, these values can be defined:

| Field | Type | Description
| :---- | :---- | :----
| weight | float or vpTemporalWeighting | The importance given by the user to this feature type.
| display | boolean | Whether the features should be displayed when calling vpRBTracker::display().


For weighting, it can be of interest for certain features to give them more importance at the end of optimization, when the pose should be close to the correct solution.
Indeed, certain features such as Silhouette contours or depth features exhibit a limited convergence domain (depending on their settings) but do not suffer from modelling inaccuracies, like keypoints.
As such, they are ideal to correct the small remaining error that cannot be removed by keypoints.

To do so, you can use a sigmoid-like weighting scheme for such features. This scheme modifies the weight during the optimization (called every frame).
The configuration is the following:
| Field | Type | Description
| :---- | :---- | :----
| minWeight | float | The minimum weight given to the feature (either at the start or end of optimization, depending on the increasing flag)
| maxWeight | float | The maximum weight given to the feature (either at the start or end of optimization, depending on the increasing flag)
| midpointLocation | float | The optimization point where the weight reaches the halfpoint between minWeight and maxWeight. It is specified as a fraction of the number of iterations (between 0 and 1)
| slopePower | int | How fast the weighting transitions from minWeight to maxWeight
| increasing | boolean | Whether the weight should increase (go from minWeight to maxWeight) or decrease (maxWeight to minWeight)

\image html rbt-temporal-weighting.png Different values for temporal weighting.

Example configurations:

For a features whose importance does not change during optimization
\code{.json}
{
  "weight": 1.0,
  "display": false
}
\endcode

For a feature that is more important at the end of optimization (A feature that has a more local convergence domain)
\code{.json}
{
  "weight": {
    "minWeight": 0.0,
    "maxWeight": 1.0,
    "midpointLocation": 0.5,
    "slopePower": 4,
    "increasing": true
  },
  "display": false
}
\endcode

And accordingly, a feature whose importance decreases (such as features that have a large convergence domain but that may be inaccurate):
\code{.json}
{
  "weight": {
    "minWeight": 0.0,
    "maxWeight": 1.0,
    "midpointLocation": 0.5,
    "slopePower": 4,
    "increasing": false
  },
  "display": false
}
\endcode

\paragraph rbt_tracking_config_features_klt 4.2.7.1. vpRBKltTracker

\warning This feature requires that ViSP has been built with OpenCV as a 3rd party (see \ref soft_vision_opencv
installation instructions). Otherwise, loading this JSON configuration will raise an error about encountering an
unknown tracker.

Example configuration:
\code{.json}
{
 "type": "klt",
 "weight": 1,

 "useMask": true,
 "minMaskConfidence": 0.5,

 "maxReprojectionErrorPixels": 5.0,
 "newPointsMinPixelDistance": 4,
 "minimumNumPoints": 20,

 "blockSize": 5,
 "useHarris": true,
 "harris": 0.05,
 "maxFeatures": 500,
 "minDistance": 5.0,
 "pyramidLevels": 3,
 "quality": 0.01,
 "windowSize": 5
}
\endcode

| Field | Type | Description
| :---- | :---- | :----
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features.
| minMaskConfidence | float | If useMask is true, minimum confidence for a feature to be accepted.
| maxReprojectionErrorPixels | float | Maximum reprojection error, above which a tracked KLT point is rejected for future tracking iterations.
| newPointsMinPixelDistance | float | During KLT resampling, minimum euclidean distance that a point should have with all other to be considered for tracking.
| minimumNumPoints | int | Minimum number of points that should be tracked at a given timestep. If the number of points is below, then KLT tracking is fully reinitialized.

\note For the other settings, see the vpKltOpenCV documentation.

\paragraph rbt_tracking_config_features_me 4.2.7.2. vpRBSilhouetteMeTracker

Example configuration:
\code{.json}
{
 "type": "silhouetteMe",
 "weight": 1,
 "numCandidates": 3,

 "useMask": false,
 "minMaskConfidence": 0.8,

 "movingEdge": {
 "maskSize": 7,
 "minSampleStep": 1.0,
 "mu": [
 0.5,
 0.5
 ],
 "nMask": 180,
 "range": 16,
 "sampleStep": 1.0,
 "strip": 2,
 "thresholdType": "normalized",
 "threshold": 20
 }
}
\endcode

| Field | Type | Description
| :---- | :---- | :----
| numCandidates | int | Number of edge locations that are preserved as potential matches for a given silhouette point.
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background).
| minMaskConfidence | float | If useMask is true, minimum confidence disparity for a feature to be accepted.
| movingEdge | dictionary | Moving edge tracker settings. See vpMe for more information.

\paragraph rbt_tracking_config_features_ccd 4.2.7.3. vpRBSilhouetteCCDTracker

Example configuration for a case with small motions:
\code{.json}
{
 "type": "silhouetteColor",
 "weight": 0.1,

 "useMask": true,
 "minMaskConfidence": 0.8,

 "ccd": {
 "h": 16,
 "delta_h": 1,
 "min_h": 16
 }
}
\endcode

Example configuration to handle larger motions:
\code{.json}
{
 "type": "silhouetteColor",
 "weight": 0.1,

 "useMask": true,
 "minMaskConfidence": 0.8,

 "ccd": {
 "h": 64,
 "delta_h": 4,
 "min_h": 16
 }
}
\endcode

| Field | Type | Description
| :---- | :---- | :----
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background).
| minMaskConfidence | float | If useMask is true, minimum confidence disparity for a feature to be accepted.
| maxNumPoints | int | Resample number of points to use for this tracker. This allows to use many points for the ME-based tracker, while retaining fewer points for this tracker (it is more expensive). Leave to 0 to not perform resampling.
| temporalSmoothing | float | Interpolation factor, between 0 and 1. This allows to blend a contour point's current color statistics with those computed on the previous frame. This may improve stability, but may also lead to smaller motion.
| ccd | dictionary | Contracting curve density algorithm settings. See below.

To configure the actual CCD parameters, you can modify:
| Field | Type | Description
| :---- | :---- | :----
| h | int | The initial contour search area. This size (in pixels) is used to define the normal size that is used to compute the color statistics and will strongly impact the convergence domain of the tracker.
| delta_h | int | Subsampling step when examining the contour normal. This helps diminish the tracking time when `h` is large. Using a high value (comparatively to that of `h`) can lead to instabilities.
| min_h | int | Parameter to set if you wish to use an adaptive scaling for the silhouette search area. If set, the initial search area will be set to `h`. Then, at each optimization iteration, if motion is smaller than a multiple of `h`, then `h` is halved, down to a minimum value of `min_h`. Note that if `delta_h` is set, then it is also halved in order to obtain more reliable statistics.

\paragraph rbt_tracking_config_features_depth 4.2.7.4. vpRBDenseDepthTracker

Example configuration:
\code{.json}
{
 "type": "depth",
 "weight": 1,

 "useMask": true,
 "minMaskConfidence": 0.5,

 "step": 2,
 "display": false
},
\endcode

| Field | Type | Description
| :---- | :---- | :----
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features.
| minMaskConfidence | float | If useMask is true, minimum confidence for a feature to be accepted.
| step | int | Step in pixels to sample the render map. A low value for this parameter will lead to a larger number of depth features. As this step is used when iterating on rows and columns of the render images, the number of depth features will quickly decrease as the step grows.



\subsection rbt_python_options 4.3 Python specific configuration

\warning The features defined below are only available when using the Python bindings. To install and use the bindings see \ref tutorial-install-python-bindings.

When using the Python bindings, more options are available. These are defined in `modules/python/bindings/visp/python/rbt`




You can load Python features from a JSON configuration.
To do so, you can place the Python configuration in a JSON object under the "python_ext" key.

Your config file should look like:
\code{.json}
{
 "features": [ // CPP features
 ...
 ],
 ...,

 "python_ext": {
 "features": [
 ...
 ],
 "odometry": {
 ...
 }
 }
}
\endcode

For more information see \ref modules/python/bindings/visp/python/rbt/PythonRBExtensions.py


An example of RBT using Python is available under modules/python/examples/realsense-rbt.py

\subsubsection rbt_python_features 4.3.1. XFeat-based options

We provide a wrapper around the XFeat keypoints. XFeat keypoints are robust, and allow for reliable matching,
even in the presence of large motion blur and image difference. They are far more powerful than the KLT based features.

XFeat keypoints are matched with 3D points that are stored in a map (see vpPointMap). This map either contains objects points (expressed in the object's frame)
or in the environment frame. When a new image arrives, keypoints are extracted, then matched with the points in the map.
The correspondences can then be used to minimize a visual error to estimate the object or camera motion. The visual error can be 2D (classical IBVS) or in 3D (Point minimisation in metric space)
After minimization incorrect matches are detected and removed from the 3D point map. New points are added.
\warning As XFeat is based on deep learning, having a GPU is preferable. Running XFeat on the CPU is still possible

We run XFeat on the whole image, and the keypoints can be used both for the object tracking and camera motion estimation via visual odometry.
To use XFeat-backed settings, the Python settings should contain the XFeat feature extraction settings.

\code{.json}
"xfeat": {
 "numPoints": 8192,
 "minCos": 0.95,
 "useDense": false,
 "onlyOnBB": false,
 "minObjMaskValue": 0.5,
 "minSilhouetteDistanceEnvPoint": 20.0,
 "minSilhouetteDistanceObjectPoint": 1.0
},
\endcode
| Field | Type | Description
| :---- | :---- | :----
| numPoints | int | The maximum number of points that are extracted in the frame
| minCos | float | The minimum matching score for two keypoints to be matched (extract from two different images). Use a high value (e.g. 0.9) to minimize the number of matches.
| useDense | boolean | Whether to use dense keypoint extraction and matching (Xfeat*)
| onlyOnBB | boolean | Wheter to run keypoint extraction only in the object region in the image. Do not use this option if you wish to use odometry.
| minObjMaskValue | float | Minimum mask value (if mask is enabled) for a keypoint to be considered as belonging to the object. If the mask value is below, it is considered an environment keypoints (that may be used for odometry).
| minSilhouetteDistanceEnvPoint | float | Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint , that can be used for odometry. Useful to reject keypoints that could potentially belong to the object in scenarios where large motions happen
| minSilhouetteDistanceObjectPoint | float | Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint, that can be used to estimate object motion. Useful to only consider keypoints that are in the texture part of the object and not its contour, where keypoints may rely on the background and thus provide unreliable matches.

To enable XFeat feature for object tracking, you should add the following to the "python_ext" object configuration:
\code{.json}
"features": [
 {
 "type": "xfeat",
 "weight": 1.0,
 "use_3d": false,
 "display": true,
 "numPoints": 8192,
 "reprojectionThreshold": 5,
 "minDistNewPoint": 0.0,
 "maxDepthErrorVisible": 0.01,
 "maxDepthErrorCandidate": 0.005
 }
]
\endcode
| Field | Type | Description
| :---- | :---- | :----
| use_3d | boolean | Whether to use 2D error criterion or 3D (requires RGBD camera)
| numPoints | int | The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints.
| reprojectionThreshold | float | The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map.
| minDistNewPoint | float | Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition.
| maxDepthErrorVisible | float | Maximum depth error (in meters) between the rendered depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Mainly used to filter self occlusion.
| maxDepthErrorCandidate | float | Maximum depth error (in meters) before a new keypoint addition is rejected. Used to filter points that do not belong to the object or for which the actual 3D error is too great.


To perform odometry using XFeat, you can add these settings:
\code{.json}
"odometry": {
  "type": "xfeat",
  "use3d": false,
  "numPoints": 8192,
  "reprojectionThreshold": 10.0,
  "minDistNewPoint": 0.0,
  "maxDepthErrorVisible": 0.02,
  "maxDepthErrorCandidate": 0.0,

  "gain": 0.5,
  "maxNumIters": 20,
  "muInit": 0.0,
  "muIterFactor": 0.1,
  "minImprovementFactor": 0.001
}
\endcode
| Field | Type | Description
| :---- | :---- | :----
| use_3d | boolean | Whether to use 2D error criterion or 3D
| numPoints | int | The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints.
| reprojectionThreshold | float | The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map.
| minDistNewPoint | float | Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition.
| maxDepthErrorVisible | float | Maximum depth error (in meters) between the actual camera depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Used to filter occlusions.
| maxDepthErrorCandidate | float | Maximum depth error (in meters) before a new keypoint addition is rejected.

\warning XFeat odometry requires an RGBD camera.


\section rbt_tracking_usage 5. Tracker usage

\subsection rbt_tracking_usage_sequence 5.1. On prerecorded sequences

In order to test the tracker, we provide a prerecorded sequence of images named `dragon.mp4` and the dragon object,
whose CAD model file `dragon.bam` is provided in the ViSP source code.

The result of tracking the dragon can be obtained by going to the build folder:

\code{.sh}
~/visp-build $ cd tutorial/tracking/render-based
\endcode

then launching:
\code{.sh}
$ ./tutorial-rbt-sequence --color data/dragon/dragon.mp4 --tracker data/dragon/dragon.json
\endcode

Note that:
- The full code to track an offline, preregistered sequence is in tutorial-rbt-sequence.cpp
- Parsing and display utils are given in render-based-tutorial-utils.h

More explanation is provided in the next section.

\subsection rbt_tracking_usage_realsense 5.2. With a realsense camera

We will now focus on an other example usage of the tracker, using a Realsense camera.

Note that:
- The full code for this realsense-based tutorial is available in tutorial-rbt-realsense.cpp
- Parsing and display utils are given in render-based-tutorial-utils.h

To run the program, start by placing yourself in the ViSP build directory.
Assuming you have compiled and built the tutorials, navigate to the tutorial folder:
\verbatim
~/visp-build $ cd tutorial/tracking/render-based
\endverbatim

You can then examine the command line arguments with:

\code{.sh}
~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense -h
\endcode

Which outputs:
\verbatim
Program description: Tutorial showing the usage of the Render-Based tracker with a RealSense camera
Arguments:
 --config Path to the JSON configuration file. Values in this files are loaded, and can be overridden by command line arguments.
 Optional

 --debug-display Enable additional displays from the renderer
 Default: false

 --fps Realsense requested framerate
 Default: 60
 Optional

 --height Realsense requested image height
 Default: 480
 Optional

 --init-file Path to the JSON file containing the 2D/3D correspondences for initialization by click
 Default: ""
 Optional

 --max-depth-display Maximum depth value, used to scale the depth display
 Default: 1.0
 Optional

 --no-display Disable display windows
 Default: true

 --object Name of the object to track. Used to potentially fetch the init file
 Default: ""
 Optional

 --plot-cov Plot the pose covariance trace for each feature
 Default: false

 --plot-divergence Plot the metrics associated to the divergence threshold computation
 Default: false

 --plot-pose Plot the pose of the object in the camera frame
 Default: false

 --plot-position Plot the position of the object in a 3d figure
 Default: false

 --pose Initial pose of the object in the camera frame.
 Default: []
 Optional

 --profile Enable the use of Pstats to profile rendering times
 Default: false

 --save Whether to save experiment data
 Default: false

 --save-path Where to save the experiment log. The folder should not exist.
 Default: ""
 Optional

 --save-video Whether to save the video
 Default: false

 --tracker Path to the JSON file containing the tracker
 Default: ""
 Optional

 --video-framerate Output video framerate
 Default: 30
 Optional

 --width Realsense requested image width
 Default: 848
 Optional

Example JSON configuration file:

{
 "--debug-display": false,
 "--fps": 60,
 "--height": 480,
 "--init-file": "",
 "--max-depth-display": 1.0,
 "--no-display": true,
 "--object": "",
 "--plot-cov": false,
 "--plot-divergence": false,
 "--plot-pose": false,
 "--plot-position": false,
 "--pose": [],
 "--profile": false,
 "--save": false,
 "--save-path": "",
 "--save-video": false,
 "--tracker": "",
 "--video-framerate": 30,
 "--width": 848
}
\endverbatim

You can see that among those arguments, the script has two main parameters.
The first, `--tracker` is the path to the .json configuration file, defined following the details given in
\ref rbt_tracking_config section.
The second `--object`, is the path to the 3D model (in a format readable by the Panda3D engine).
Alternatively, it can be specified in the .json file.

Here, we will run the tutorial with the following arguments
\verbatim
~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense \
 --tracker data/dragon/dragon_realsense.json \
 --object data/dragon/dragon.bam \
 --init-file data/dragon/dragon.init
\endverbatim

The file `dragon_realsense.json` contains the following configuration:
\include data/dragon/dragon_realsense.json

\warning Note that to perform your own tests, you will have to rebuild the tutorial-rbt-realsense every time you make
a change to the .json file located in the source directory so that the changes are reflected in the build directory

Let's now have a look at the tutorial code that will perform the tracking. We recall that the full code is given
in tutorial-rbt-realsense.cpp.

The code first starts by parsing the command line arguments, detailed above, using structures defined in
render-based-tutorial-utils.h:
\snippet tutorial-rbt-realsense.cpp Command line parsing

We start by declaring an instance of the tracker and load the given JSON configuration file.
\snippet tutorial-rbt-realsense.cpp Loading config

Depending on the configuration, the call to vpRBTracker::loadConfigurationFile() may raise an error.
For instance, The KLT tracker requires OpenCV and if you did not compile ViSP with OpenCV as a third party,
an error will be raised.

If successful, we then open a connection to the realsense camera, requesting a specific configuration.
This configuration depends on the input arguments (depth, width and framerate)
\snippet tutorial-rbt-realsense.cpp Realsense opening

If the configuration is available, we can proceed and update the tracker with a new, more relevant configuration.
Indeed, while camera intrinsics and image resolution can be declared in the configuration file,
using the intrinsics provided by the RealSense will be more accurate and ensures that there is no configuration mismatch.
We then call vpRBTracker::startTracking() to initialize the renderer, load the 3D model and other setup related operations.

\snippet tutorial-rbt-realsense.cpp Tracker update

\warning It is at this point that some errors related to an incorrect intrinsics configuration may be raised.
Additionally, if you do not have an OpenGL-capable device or if Panda3D cannot open a window, an error may be raised.
If so, please see \ref rbt_troubleshooting.

We then proceed and declare the different images that will be needed by the tutorial.
Some images contain raw information and will be fed to the tracker.
Others will be used for display purposes.
\snippet tutorial-rbt-realsense.cpp Images

We then create the different display windows with
\snippet tutorial-rbt-realsense.cpp Create displays

Once the setup is finished, we wait on the user to setup is environment and position the object (here, the dragon) in
front of the camera.
\snippet tutorial-rbt-realsense.cpp Wait before init

When the user has clicked the display window, initialization can be performed.
\snippet tutorial-rbt-realsense.cpp Init
If the initial pose is given as a command line argument, tracking can directly proceed.
Otherwise, The user must click on different pixel positions that correspond to known 3D points, defined in a
initialization file located next to the configuration file or the mesh.
In this case, the initialization file is located at `data/dragon/dragon.init` and contains:
\include data/dragon/dragon.init

Once initialization has been performed and a starting pose given, we can enter the tracking loop.
At every iteration, the same simple steps are performed:

We start by acquiring the most recent color and depth frames.
\snippet tutorial-rbt-realsense.cpp Image acquisition
The alignment parameter ensures that the depth map is aligned with the color frame, as required by the RBT.
The call to updateDepth() converts from a raw uint16_t representation to a depth map in meters.
We also convert The RGB image to a grayscale image using vpImageConvert::convert().

We can then call the tracker with the different frames:
\snippet tutorial-rbt-realsense.cpp Call to tracker
As an output, we get a vpRBTrackingResult object.
This object contains information about what the tracker has done, the number of features that were used, etc.
It contains one important information, which is the reason for why the tracking call has returned.
In the following snippet:
\snippet tutorial-rbt-realsense.cpp Result parsing
We use this information to check whether:
- There was an exception
- The object is present in the image (if not, no information can be extracted from the render): reinitialization
 should be performed
- There are enough features to compute pose updates.
- If the tracking has converged to a given pose (i.e., motion during optimization has become small enough to be
 considered as noise).
- Or if the tracking has used the maximum allocated number of optimization iterations.

In the first three cases, this means that tracking should be reinitialized.

After that, we check (if it has been computed) the tracking confidence score. If it is low, we emit a warning.
In a more complete application, the pipeline would contain an automatic reinitialization procedure.

We then display the current frame and overlay the tracking information
\snippet tutorial-rbt-realsense.cpp Display

On the grayscale image, you can see:
- If enabled, the moving edge contours. Valid edges are seen in green, while suppressed ones are purple.
- If enabled, the KLT tracked points, represented as red dots on the object.

On the color image, you can see:
- The pose of the object in the camera frame, represented as a 3-axis basis
- If enabled, the contours used by the CCD tracker.

The display also contains the depth image:
- By default, depth display is disabled as it incurs a large performance penalty and may reduce framerate.
 You can change the tracker configuration to enable by adding `"display": true` to the "depth" feature object.

Finally, you should also see a black and white image which is the pixel wise segmentation mask that is used to filter
out the features. If no segmentation method is used, the image should be black.

\image html rbt-full-display.jpg Example displays when tracking with the realsense camera and all features enabled. width=75%


We also log in the console timing details.
Running the tutorial with the **--verbose** flag will print timings for each part of the RBT pipeline.

\snippet tutorial-rbt-realsense.cpp Logging

When the user has pressed left click, we exit the tracking loop and perform a small cleanup:
\snippet tutorial-rbt-realsense.cpp Cleanup

\section rbt_troubleshooting 6. Troubleshooting issues

\subsection rbt_slow 6.1. RBT performance are bad and cores are not used

One possible reason for this is that we make large use of OpenMP to parallelize and split the workload.
An invalid OpenMP configuration may lead to a substantial loss in performance.

In Ubuntu 24.04, we have observed with the GNU OpenMP implementation (that is used if you compile with gcc/g++) that
the default settings lead to a slow down by a factor of 4/5.

This is due to an active waiting policy. We recommend using the following OpenMP settings:

\code{.sh}
$ export GOMP_SPINCOUNT=0
\endcode

Note that this configuration will only be kept for the lifetime of your terminal and you may have to restore it the
next time you start the tracker.

\subsection rbt_panda_window 6.2. Cannot open window (hidden or visible)

This error can appear when calling vpRBTracker::startTracking().

It may be useful to check the backend used by Panda3D, which should be displayed when you start the tutorial:
\verbatim
Known pipe types:
 glxGraphicsPipe
\endverbatim

You can also additionally get more information bout your OpenGL support.
In ubuntu, you can use:
\verbatim
glxinfo | grep version
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
OpenGL core profile version string: 4.6.0 NVIDIA 550.144.03
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL version string: 4.6.0 NVIDIA 550.144.03
OpenGL shading language version string: 4.60 NVIDIA
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.144.03
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
 GL_EXT_shader_group_vote, GL_EXT_shader_implicit_conversions,
\endverbatim

If you do not have something similar, it may be that your platform has no OpenGL support.

\warning It is possible that your hardware has OpenGL support, but does not support **headless rendering**, i.e.,
rendering without a screen plugged to the GPU.
In this case, you should either try to use the **p3headlessgl** backend or use a virtual screen emulator.
- To use **p3headlessgl**, you should first visit
 [this Panda3D documentation link](https://docs.panda3d.org/1.10/python/programming/configuration/configuring-panda3d).
 Then you should edit the **Config.prc** file to set "load-display" to "p3headlessgl". In Ubuntu, the Config.prc is
 located in `/etc/Config.prc`. In Conda, it will be in `$CONDA_PREFIX/etc/Config.prc`.
- In Linux distributions, you can use `xvfb` to emulate a x11 server to render to.

\warning Note that the RBT uses shaders that are compiled for OpenGL core 3.3.

\section rbt_extension 7. Extending the RBT

\warning This section is under construction and will be improved in the future!

\subsection rbt_extension_features 7.1. Defining your own features

\subsection rbt_extension_factory_pattern 7.2. Registering your own component for JSON parsing

As seen in \ref rbt_tracking_config, the different components of the RBT pipeline can be parsed from a JSON
configuration file. This requires a ViSP version that is built with the nlohmann::json third party
(see \ref rbt_tracking_install_requirements).

This functionality is implemented using a factory pattern as defined in the vpDynamicFactory.

For your own class to be loadable, you should call vpDynamicFactory::registerType with a lambda function that defines
how a JSON object can be used to configure an object from your type

For instance, in the vpRBDriftDetectorFactory, an object of type vpRBProbabilistic3DDriftDetector is registered as:
\code{.cpp}
registerType("probabilistic", [](const nlohmann::json &j) {
 std::shared_ptr<vpRBProbabilistic3DDriftDetector> p(new vpRBProbabilistic3DDriftDetector());
 p->loadJsonConfiguration(j);
 return p;
});
\endcode

This means that when parsing the JSON object's drift field, if the **"type"** field is equal to "probabilistic",
then this function will be called and the json object will used to build a vpRBProbabilistic3DDriftDetector.

This patterns hold for the different types of components:

- vpRBFeatureTrackerFactory to register subtypes of vpRBFeatureTracker
- vpRBDriftDetectorFactory to register vpRBDriftDetector
- vpObjectMaskFactory to register vpObjectMask

*/