1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
|
.. _benchmarking:
Benchmarking 3D
---------------
This document introduces benchmarking concepts for 3D algorithms. By
*benchmarking* here we refer to the possibility of testing different
computational pipelines in an **easy manner**. The goal is to test their
reproductibility with respect to a particular problem of general interest.
Benchmarking Object Recognition
-------------------------------
For the general problem of Object Recognition (identification, categorization,
detection, etc -- all fall in the same category here), we identify the
following steps:
1. Training
^^^^^^^^^^^
Users should be able to acquire training data from different inputs, including
but not limited to:
* full triangle meshes (CAD models);
* 360-degree full point cloud models;
* partial point cloud views:
* in clutter;
* cleanly segmented.
2. Keypoints
^^^^^^^^^^^^
Computing higher level representation from the object's appearance (texture + depth) should be done:
* **densely** - at every point/vertex in the input data;
* at certain **interest points** (i.e., keypoints).
The detected keypoint might also contain some meta-information required by some descriptors, like scale or orientation.
3. Descriptors
^^^^^^^^^^^^^^
A higher level representation as mentioned before will be herein represented by a **feature descriptor**. Feature descriptors can be:
* 2D (two-dimensional) -- here we refer to those descriptors estimated solely from RGB texture data;
* 3D (three-dimensional) -- here we refer to those descriptors estimated solely from XYZ/depth data;
* a combination of the above.
In addition, feature descriptors can be:
* **local** - estimated only at a set of discrete keypoints, using the information from neighboring pixels/points;
* **global**, or meta-local - estimated on entire objects or the entire input dataset.
4. Classification
^^^^^^^^^^^^^^^^^
The distribution of features should be classifiable into distinct, separable
classes. For local features, we identify two sets of techniques:
* **bag of words**;
* **voting**;
* **supervised voting** (regression from the description to the relative 3D location, e.g. Hough forest).
For global features, any general purpose classification technique should work (e.g., SVMs, nearest neighbors, etc).
In addition to classification, a substep of it could be considered
**Registration**. Here we refine the classification results using iterative
closest point techniques for example.
5. Evaluation
^^^^^^^^^^^^^
This pipeline should be able to evaluate the algorithm's performance at
different tasks. Here are some requested tasks to support:
* object id and pose
* object id and segmentation
* object id and bounding box
* category and segmentation
* category and bounding box
5.1 Metrics
"""""""""""
This pipeline should provide different metrics, since algorithms excel in
different areas. Here are some requested metrics:
* precision-recall
* time
* average rank of correct id
* area under curve of cumulative histogram of rank of correct id
Object Recognition API
======================
Here we describe a proposed set of classes that could be easily extended and
used for the purpose of benchmarking object recognition tasks.
1. Training
^^^^^^^^^^^
2. Keypoints
^^^^^^^^^^^^
3. Descriptors
^^^^^^^^^^^^^^
4. Classification
^^^^^^^^^^^^^^^^^
5. Evaluation
^^^^^^^^^^^^^
The evaluation output needs to be one of the following:
* object id
* object pose
* object category
* object bounding box
* object mask
|