1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188
|
// Copyright (c) 2018, Apple Inc. All rights reserved.
//
// Use of this source code is governed by a BSD-3-clause license that can be
// found in LICENSE.txt or at https://opensource.org/licenses/BSD-3-Clause
syntax = "proto3";
option optimize_for = LITE_RUNTIME;
import public "DataStructures.proto";
package CoreML.Specification;
/*
* Non-maximum suppression of axis-aligned bounding boxes.
*
* This is used primarily for object detectors that tend to produce multiple
* boxes around a single object. This is a byproduct of the detector's
* robustness to spatial translation. If there are two or more bounding boxes
* that are very similar to one another, the algorithm should return only a
* single representative.
*
* Similarity between two bounding boxes is measured by intersection-over-union
* (IOU), the fraction between the area of intersection and area of the union.
* Here is an example where the areas can be calculated by hand by counting
* glyphs::
*
* +-------+ +-------+
* | | | |
* | +------+ +--+ | +---+
* | | | | | | | |
* +-------+ | +--+ +----+ |
* | | | |
* +------+ +------+
* Intersection Union
* IOU: 0.16 = 12 / 73
*
* All IOU scores are fractions between 0.0 (fully disjoint) and 1.0 (perfect
* overlap). The standard algorithm (PickTop) is defined as follows:
*
* 1. Sort boxes by descending order of confidence
* 2. Take the top one and mark it as keep
* 3. Suppress (mark it as discard) all boxes within a fixed IOU radius of the
* keep box
* 4. Go to 2 and repeat on the subset of boxes not already kept or discarded
* 5. When all boxes are processed, output only the ones marked as keep
*
* Before the algorithm, boxes that fall below the confidence threshold are
* discarded.
*/
message NonMaximumSuppression {
// Suppression methods:
/*
* Pick the bounding box of the top confidence, suppress all within a radius.
*/
message PickTop {
/*
* Suppression is only done among predictions with the same label
* (argmax of the confidence).
*/
bool perClass = 1;
}
/*
* Choose which underlying suppression method to use
*/
oneof SuppressionMethod {
PickTop pickTop = 1;
}
/*
* Optional class label mapping.
*/
oneof ClassLabels {
StringVector stringClassLabels = 100;
Int64Vector int64ClassLabels = 101;
}
/*
* This defines the radius of suppression. A box is considered to be within
* the radius of another box if their IOU score is less than this value.
*/
double iouThreshold = 110;
/*
* Remove bounding boxes below this threshold. The algorithm run-time is
* proportional to the square of the number of incoming bounding boxes
* (O(N^2)). This threshold is a way to reduce N to make the algorithm
* faster. The confidence threshold can be any non-negative value. Negative
* confidences are not allowed, since if the output shape is specified to be
* larger than boxes after suppression, the unused boxes are filled with
* zero confidence. If the prediction is handled by Core Vision, it is also
* important that confidences are defined with the following semantics:
*
* 1. Confidences should be between 0 and 1
* 2. The sum of the confidences for a prediction should not exceed 1, but is
* allowed to be less than 1
* 3. The sum of the confidences will be interpreted as the confidence of
* any object (e.g. if the confidences for two classes are 0.2 and 0.4,
it means there is a 60% (0.2 + 0.4) confidence that an object is
present)
*/
double confidenceThreshold = 111;
/*
* Set the name of the confidence input.
*
* The input should be a multi-array of type double and shape N x C. N is
* the number of boxes and C the number of classes. Each row describes the
* confidences of each object category being present at that particular
* location. Confidences should be nonnegative, where 0.0 means the highest
* certainty the object is not present.
*
* Specifying shape is optional.
*/
string confidenceInputFeatureName = 200;
/*
* Set the name of the coordinates input.
*
* The input should be a multi-array of type double and shape N x 4. The
* rows correspond to the rows of the confidence matrix. The four values
* describe (in order):
*
* - x (center location of the box along the horizontal axis)
* - y (center location of the box along the vertical axis)
* - width (size of box along the horizontal axis)
* - height (size of box on along the vertical axis)
*
* Specifying shape is optional.
*/
string coordinatesInputFeatureName = 201;
/*
* The iouThreshold can be optionally overridden by specifying this string
* and providing a corresponding input of type double. This allows changing
* the value of the parameter during run-time.
*
* The input should be a scalar double between 0.0 and 1.0. Setting it to 1.0
* means there will be no suppression based on IOU.
*/
string iouThresholdInputFeatureName = 202;
/*
* The confidenceThreshold can be optionally overridden by specifying this
* string and providing a corresponding input. This allows changing the
* value of the parameter during run-time, which can aid setting it just
* right for a particular use case.
*
* The input should be a scalar double with nonnegative value.
*/
string confidenceThresholdInputFeatureName = 203;
/*
* Set the name of the confidence output. The output will be the same type
* and shape as the corresponding input. The only difference is that the
* number of rows may have been reduced.
*
* Specifying shape is optional. One reason to specify shape is to limit
* the number of output boxes. This can be done is several ways:
*
* Fixed shape:
* The output can be pinned to a fixed set of boxes. If this number is larger
* than the number of boxes that would have been returned, the output is
* padded with zeros for both confidence and coordinates. Specifying a fixed
* shape can be done by setting either shape (deprecated) or allowedShapes set
* to fixedsize.
*
* Min/max:
* It is also possible to set both a minimum and a maximum. The same
* zero-padding as for fixed shape is applied when necessary. Setting min/max
* is done by defining two allowedShapes, where the first dimension uses a
* rangeofsizes defining lowerbound and upperbound.
*/
string confidenceOutputFeatureName = 210;
/*
* Set the name of the coordinates output. The output will be the same type
* and shape as the corresponding input. The only difference is that the
* number of rows may have been reduced.
*
* Specifying shape is optional. See confidence output for a more detailed
* description. Note that to achieve either fixed shape output or a
* constraint range of boxes, only one of confidence or coordinates need to
* set a shape. Both shapes are allowed to be defined, but in such case they
* have to be consistent along dimension 0.
*/
string coordinatesOutputFeatureName = 211;
}
|