1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
|
/**
\page tutorial-detection-object Tutorial: Object detection and localization
\tableofcontents
\section object_detection_intro Introduction
This tutorial will show you how to use keypoints to detect and estimate the pose of a known object using his cad model. The first step consists in detecting and learning keypoints located on the faces of an object, while the second step makes the matching between the detected keypoints in the query image with those previously learned. The pair of matches are then used to estimate the pose of the object with the knowledge of the correspondences between the 2D and 3D coordinates.
The next section presents a basic example of the detection of a teabox with a detailed description of the different steps.
Note that all the material (source code and video) described in this tutorial is part of ViSP source code and could be downloaded using the following command:
\code
$ svn export https://github.com/lagadic/visp.git/trunk/tutorial/detection/object
\endcode
\section object_detection Object detection using keypoints
\subsection detection_object_preamble Preamble
You are advised to read the following tutorials \ref tutorial-tracking-mb-generic and \ref tutorial-matching if you are not aware of these concepts.
\subsection detection_object_principle Principle of object detection using keypoints
A quick overview of the principle is summed-up in the following diagrams.
\image html img-learning-step.jpeg Learning step.
The first part of the process consists in learning the characteristics of the considered object by extracting the keypoints detected on the different faces.
We use here the model-based tracker initialized given a known initial pose to have access to the cad model of the object. The cad model is then used to select only keypoints on faces that are visible and to calculate the 3D coordinates of keypoints.
\note The calculation of the 3D coordinates of a keypoint is based on a planar location hypothesis. We assume that the keypoint is located on a planar face and the Z-coordinate is retrieved according to the proportional relation between the plane equation expressed in the normalized camera frame (derived from the image coordinate) and the same plane equation expressed in the camera frame, thanks to the known pose of the object.
In this example the learned data (the list of 3D coordinates and the corresponding descriptors) are saved in a file and will be used later in the detection part.
\image html img-detection-step.jpeg Detection step.
In a query image where we want to detect the object, we find the matches between the keypoints detected in the current image with those previously learned.
The estimation of the pose of the object can then be computed with the 3D/2D information.
The next section presents an example of the detection and the pose estimation of a teabox.
\subsection detection_object_teabox_example Teabox detection and pose estimation
The following video shows the resulting detection and localization of a teabox that is learned on the first image of the video.
\htmlonly<iframe width="420" height="315" src="https://www.youtube.com/embed/3rCvq5_UJvw" frameborder="0" allowfullscreen></iframe>
\endhtmlonly
The corresponding code is available in tutorial-detection-object-mbt.cpp. It contains the different steps to learn the teabox object on one image (the first image of the video) and then detect and get the pose of the teabox in the rest of the video.
\include tutorial-detection-object-mbt.cpp
You may recognize with the following lines the code used in \ref tutorial-mb-edge-tracker.cpp to initialize the model-based tracker at a given pose and with the appropriate configuration.
\snippet tutorial-detection-object-mbt.cpp MBT code
The modifications made to the code start from now.
First, we have to choose about which type of keypoints will be used. SIFT keypoints are a widely type of keypoints used in computer vision, but depending of your version of OpenCV and due to some patents, certain types of keypoints will not be available.
Here, we will use SIFT if available, otherwise a combination of FAST keypoint detector and ORB descriptor extractor.
\snippet tutorial-detection-object-mbt.cpp Keypoint selection
The following line declares an instance of the vpKeyPoint class :
\snippet tutorial-detection-object-mbt.cpp Keypoint declaration
You can load the configuration (type of detector, extractor, matcher, ransac pose estimation parameters) directly with an xml configuration file :
\snippet tutorial-detection-object-mbt.cpp Keypoint xml config
Otherwise, the configuration must be made in the code.
\snippet tutorial-detection-object-mbt.cpp Keypoint code config
We then detect keypoints in the reference image with the object we want to learn :
\snippet tutorial-detection-object-mbt.cpp Keypoints reference detection
But we need to keep keypoints only on faces of the teabox. This is done by using the model-based tracker to first eliminate keypoints which do not belong to the teabox and secondly to have the plane equation for each faces (and so to be able to compute the 3D coordinate from the 2D information).
\snippet tutorial-detection-object-mbt.cpp Keypoints selection on faces
The next step is the building of the reference keypoints. The descriptors for each keypoints are also extracted and the reference data consist of the lists of keypoints / descriptors and the list of 3D points.
\snippet tutorial-detection-object-mbt.cpp Keypoints build reference
We save the learning data in a binary format (the other possibilitie is to save in an xml format but which takes more space) to be able to use it later.
\snippet tutorial-detection-object-mbt.cpp Save learning data
We then visualize the result of the learning process by displaying with a cross the location of the keypoints:
\snippet tutorial-detection-object-mbt.cpp Display reference keypoints
We declare now another instance of the vpKeyPoint class dedicated this time to the detection of the teabox. The configuration is directly loaded from an xml file, otherwise this is done directly in the code.
\snippet tutorial-detection-object-mbt.cpp Init keypoint detection
The previously saved binary file corresponding to the teabox learning data is loaded:
\snippet tutorial-detection-object-mbt.cpp Load teabox learning data
We are now ready to detect the teabox in a query image. The call to the function vpKeyPoint::matchPoint() returns true if the matching was successful and permits to get the estimated homogeneous matrix corresponding to the pose of the object. The reprojection error is also computed.
\snippet tutorial-detection-object-mbt.cpp Matching and pose estimation
In order to display the result, we use the tracker initialized at the estimated pose and we display also the location of the world frame:
\snippet tutorial-detection-object-mbt.cpp Tracker set pose
\snippet tutorial-detection-object-mbt.cpp Display
The pose of the detected object can then be used to initialize a tracker automatically rather then using a human initialization; see \ref tutorial-tracking-mb-generic and \ref tutorial-tracking-tt.
\subsection detection_object_quick_explanation Quick explanation about some parameters used in the example
The content of the configuration file named detection-config-SIFT.xml and provided with this example is described in the following lines :
\include detection-config-SIFT.xml
In this configuration file, SIFT keypoints are used.
Let us explain now the configuration of the matcher:
- a brute force matching will explore all the possible solutions to match a considered keypoints detected in the current image to the closest (in descriptor distance term) one in the reference set, contrary to the other type of matching using the library FLANN (Fast Library for Approximate Nearest Neighbors) which contains some optimizations to reduce the complexity of the solution set,
- to eliminate some possible false matching, one technique consists of keeping only the keypoints whose are sufficienly discriminated using a ratio test.
Now, for the Ransac pose estimation part :
- two methods are provided to estimate the pose in a robust way: one using OpenCV, the other method uses a virtual visual servoing approach using ViSP,
- basically, a Ransac method is composed of two steps repeated a certain number of iterations: first we pick randomly 4 points and estimate the pose, the second step is to keep all points which sufficienly "agree" (the reprojection error is below a threshold) with the pose determinated in the first step. These points are inliers and form the consensus set, the other are outliers.
If enough points are in the consensus set (here 20 % of all the points), the pose is refined and returned, otherwise another iteration is made (here 200 iterations maximum).
Below you will also find the content of detection-lconfig.xml configuration file, also provided in this example. It allows to use FAST detector and ORB extractor.
\include detection-config.xml
\section detection_object_additional_functionalities Additional functionalities
\subsection detection_object_multiple_learning How to learn keypoints from multiple images
The following video shows an extension of the previous example where here we learn a cube from 3 images and then detect an localize the cube in all the images of the video.
\htmlonly
<iframe width="420" height="315" src="https://www.youtube.com/embed/7akuCtViznE" frameborder="0" allowfullscreen></iframe>
\endhtmlonly
The corresponding source code is given in tutorial-detection-object-mbt2.cpp. If you have a look on this file you will find the following.
Before starting with the keypoints detection and learning part, we have to set the correct pose for the tracker using a predefined pose:
\snippet tutorial-detection-object-mbt2.cpp Set tracker pose
One good thing to do is to refine the pose by running one iteration of the model-based tracker:
\snippet tutorial-detection-object-mbt2.cpp Refine pose
The vpKeyPoint::buildReference() allows to append the current detected keypoints with those already present by setting the function parameter append to true.
But before that, the same learning procedure must be done in order to train on multiple images. We detect keypoints on the desired image:
\snippet tutorial-detection-object-mbt2.cpp Keypoints reference detection
Then, we keep only keypoints that are located on the object faces:
\snippet tutorial-detection-object-mbt2.cpp Keypoints selection on faces
And finally, we build the reference keypoints and we set the flag append to true to say that we want to keep the previously learned keypoints:
\snippet tutorial-detection-object-mbt2.cpp Keypoints build reference
\subsection detection_object_display_multiple_images How to display the matching when the learning is done on multiple images
In this section we will explain how to display the matching between keypoints detected in the current image and their correspondances in the reference images that are used during the learning stage, as given in the next video:
\htmlonly
<iframe width="420" height="315" src="https://www.youtube.com/embed/q16lXMIVDbM" frameborder="0" allowfullscreen></iframe>
\endhtmlonly
\warning If you want to load the learning data from a file, you have to use a learning file that contains training images (with the parameter saveTrainingImages vpKeyPoint::saveLearningData() set to true when saving the file, by default it is).
Before showing how to display the matching for all the training images, we have to attribute an unique identifier (a positive integer) for the set of keypoints learned for a particular image during the training process:
\snippet tutorial-detection-object-mbt2.cpp Keypoints build reference
It permits to link the training keypoints with the correct corresponding training image.
After that, the first thing to do is to create the image that will contain the keypoints matching with:
\snippet tutorial-detection-object-mbt2.cpp Create image matching
The previous line allows to allocate an image with the correct size according to the number of training images used.
Then, we have to update for each new image the matching image with the current image:
\snippet tutorial-detection-object-mbt2.cpp Insert image matching
\note The current image will be inserted preferentially at the center of the matching image if it possible.
And to display the matching we use:
\snippet tutorial-detection-object-mbt2.cpp Display image matching
We can also display the RANSAC inliers / outliers in the current image and in the matching image:
\snippet tutorial-detection-object-mbt2.cpp Display RANSAC inliers
\snippet tutorial-detection-object-mbt2.cpp Display RANSAC outliers
The following code shows how to retrieve the RANSAC inliers and outliers:
\snippet tutorial-detection-object-mbt2.cpp Get RANSAC inliers outliers
Finally, we can also display the model in the matching image. For that, we have to modify the principal point offset of the intrinsic parameter.
This is more or less an hack as you have to manually change the principal point coordinate to make it works.
\snippet tutorial-detection-object-mbt2.cpp Display model image matching
\note You can refer to the full code in the section \ref detection_object_multiple_learning to have an example of how to learn from multiple images and how to display all the matching.
*/
|