1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364
|
# ONNX Concepts
ONNX can be compared to a programming language specialized
in mathematical functions. It defines all the necessary operations
a machine learning model needs to implement its inference function
with this language. A linear regression could be represented
in the following way:
```
def onnx_linear_regressor(X):
"ONNX code for a linear regression"
return onnx.Add(onnx.MatMul(X, coefficients), bias)
```
```{index} ONNX graph
```
This example is very similar to an expression a developer could
write in Python. It can be also represented as a graph that shows
step-by-step how to transform the features to get a prediction.
That's why a machine-learning model implemented with ONNX is often
referenced as an **ONNX graph**.
```{image} images/linreg1.png
```
ONNX aims at providing a common language any machine learning framework
can use to describe its models. The first scenario is to make it easier
to deploy a machine learning model in production. An ONNX interpreter
(or **runtime**) can be specifically implemented and optimized for this task
in the environment where it is deployed. With ONNX, it is possible
to build a unique process to deploy a model in production and independent
from the learning framework used to build the model.
*onnx* implements a python runtime that can be used to evaluate
ONNX models and to evaluate ONNX ops. This is intended to clarify the
semantics of ONNX and to help understand and debug ONNX tools
and converters. It is not intended to be used for production and
performance is not a goal (see {ref}`l-reference-implementation`).
## Input, Output, Node, Initializer, Attributes
Building an ONNX graph means implementing a function
with the ONNX language or more precisely the {ref}`l-onnx-operators`.
A linear regression would be written this way.
The following lines do not follow python syntax.
It is just a kind of pseudo-code to illustrate the model.
```
Input: float[M,K] x, float[K,N] a, float[N] c
Output: float[M, N] y
r = onnx.MatMul(x, a)
y = onnx.Add(r, c)
```
This code implements a function `f(x, a, c) -> y = x @ a + c`.
And *x*, *a*, *c* are the **inputs**, *y* is the **output**.
*r* is an intermediate result.
*MatMul* and *Add* are the **nodes**. They also have inputs and outputs.
A node has also a type, one of the operators in
{ref}`l-onnx-operators`. This graph was built with the example
in Section {ref}`l-onnx-linear-regression-onnx-api`.
The graph could also have an **initializer**. When an input
never changes such as the coefficients of the linear regression,
it is most efficient to turn it into a constant stored in the graph.
```
Input: float[M,K] x
Initializer: float[K,N] a, float[N] c
Output: float[M, N] xac
xa = onnx.MatMul(x, a)
xac = onnx.Add(xa, c)
```
Visually, this graph would look like the following image.
The right side describes operator *Add* where the second input
is defined as an initializer. This graph was obtained with this
code {ref}`l-onnx-linear-regression-onnx-api-init`.
```{image} images/linreg2.png
:alt: Snapshot of Netron
```
An **attribute** is a fixed parameter of an operator. Operator {ref}`l-onnx-doc-Gemm`
has four attributes, *alpha*, *beta*, *transA*, *transB*. Unless the runtime
allows it through its API, once it has loaded the ONNX graph, these values
cannot be changed and remain frozen for all the predictions.
## Serialization with protobuf
The deployment of a machine-learned model into production
usually requires replicating the entire ecosystem used to
train the model, most of the time with a *docker*.
Once a model is converted into ONNX, the production environment
only needs a runtime to execute the graph defined with ONNX
operators. This runtime can be developed in any language
suitable for the production application, C, java, python, javascript,
C#, Webassembly, ARM...
But to make that happen, the ONNX graph needs to be saved.
ONNX uses *protobuf* to serialize the graph into
one single block
(see [Parsing and Serialization](https://developers.google.com/protocol-buffers/docs/pythontutorial#parsing-and-serialization)). It aims at optimizing the model size
as much as possible.
## Metadata
Machine learned models are continuously refreshed. It is important
to keep track of the model version, the author of the model and
how it was trained. ONNX offers the possibility to store additional data
in the model itself.
- **doc_string**: Human-readable documentation for this model.
: Markdown is allowed.
- **domain**: A reverse-DNS name to indicate the model namespace or domain,
: for example, 'org.onnx'
- **metadata_props**: Named metadata as dictionary `map<string,string>`,
: `(values, keys)` should be distinct.
- **model_author**: A comma-separated list of names,
: The personal name of the author(s) of the model, and/or their organizations.
- **model_license**: The well-known name or URL of the license
: under which the model is made available.
- **model_version**: The version of the model itself, encoded in an integer.
- **producer_name**: The name of the tool used to generate the model.
- **producer_version**: The version of the generating tool.
- **training_info**: An optional extension that contains
: information for training (see {ref}`l-traininginfoproto`)
## List of available operators and domains
The main list is described here: {ref}`l-onnx-operators`.
It merges standard matrix operators (Add, Sub, MatMul, Transpose,
Greater, IsNaN, Shape, Reshape...),
reductions (ReduceSum, ReduceMin, ...)
image transformations (Conv, MaxPool, ...),
deep neural networks layer (RNN, DropOut, ...),
activations functions (Relu, Softmax, ...).
It covers most of the operations needed to implement
inference functions from standard and deep machine learning.
ONNX does not implement every existing machine learning operator,
the list of operator would be infinite.
The main list of operators is identified with a domain **ai.onnx**.
A **domain** can be defined as a set of operators.
A few operators in this list are dedicated to text but they hardly cover
the needs. The main list is also missing tree based models very
popular in standard machine learning.
These are part of another domain **ai.onnx.ml**,
it includes tree bases models (TreeEnsemble Regressor, ...),
preprocessing (OneHotEncoder, LabelEncoder, ...), SVM models
(SVMRegressor, ...), imputer (Imputer).
ONNX only defines these two domains. But the library onnx
supports any custom domains and operators
(see {ref}`l-onnx-extensibility`).
## Supported Types
ONNX specifications are optimized for numerical computation with
tensors. A *tensor* is a multidimensional array. It is defined
by:
- a type: the element type, the same for all elements in the tensor
- a shape: an array with all dimensions, this array can be empty,
a dimension can be null
- a contiguous array: it represents all the values
This definition does not include *strides* or the possibility to define
a view of a tensor based on an existing tensor. An ONNX tensor is a dense
full array with no stride.
### Element Type
ONNX was initially developed to help deploying deep learning model.
That's why the specifications were initially designed for floats (32 bits).
The current version supports all common types. Dictionary
{ref}`l-onnx-types-mapping` gives the correspondence between *ONNX*
and {mod}`numpy`.
```{eval-rst}
.. exec_code::
import re
from onnx import TensorProto
reg = re.compile('^[0-9A-Z_]+$')
values = {}
for att in sorted(dir(TensorProto)):
if att in {'DESCRIPTOR'}:
continue
if reg.match(att):
values[getattr(TensorProto, att)] = att
for i, att in sorted(values.items()):
si = str(i)
if len(si) == 1:
si = " " + si
print("%s: onnx.TensorProto.%s" % (si, att))
```
ONNX is strongly typed and its definition does not support
implicit cast. It is impossible to add two tensors or matrices
with different types even if other languages does. That's why an explicit
cast must be inserted in a graph.
### Sparse Tensor
Sparse tensors are useful to represent arrays having many null coefficients.
ONNX supports 2D sparse tensor. Class {ref}`l-onnx-sparsetensor-proto`
defines attributes `dims`, `indices` (int64) and `values`.
### Other types
In addition to tensors and sparse tensors, ONNX supports sequences of tensors,
map of tensors, sequences of map of tensors through types
{ref}`l-onnx-sequence-proto`, {ref}`l-onnx-map-proto`. They are rarely used.
## What is an opset version?
The opset is mapped to the version of the *onnx* package.
It is incremented every time the minor version increases.
Every version brings updated or new operators.
```{eval-rst}
.. exec_code::
import onnx
print(onnx.__version__, " opset=", onnx.defs.onnx_opset_version())
```
An opset is also attached to every ONNX graphs. It is a global
information. It defines the version of all operators inside the graph.
Operator *Add* was updated in version 6, 7, 13 and 14. If the
graph opset is 15, it means operator *Add* follows specifications
version 14. If the graph opset is 12, then operator *Add* follows
specifications version 7. An operator in a graph follows its most
recent definition below (or equal) the global graph opset.
A graph may include operators from several domains, `ai.onnx` and
`ai.onnx.ml` for example. In that case, the graph must define a
global opset for every domain. The rule is applied to every
operators within the same domain.
## Subgraphs, tests and loops
ONNX implements tests and loops. They all take another ONNX
graphs as an attribute. These structures are usually slow and complex.
It is better to avoid them if possible.
### If
Operator {ref}`l-onnx-doc-If` executes
one of the two graphs depending on the condition evaluation.
```
If(condition) then
execute this ONNX graph (`then_branch`)
else
execute this ONNX graph (`else_branch`)
```
Those two graphs can use any result already computed in the
graph and must produce the exact same number of outputs.
These outputs will be the output of the operator `If`.
```{image} images/dot_if.png
```
(l-operator-scan-onnx-tutorial)=
### Scan
Operator {ref}`l-onnx-doc-Scan` implements a loop with a fixed number of iterations.
It loops over the rows (or any other dimension) of the inputs and concatenates
the outputs along the same axis. Let's see an example which implements
pairwise distances: $M(i,j) = \lVert X_i - X_j \rVert^2$.
```{image} images/dot_scan.png
```
This loop is efficient even if it is still slower than a custom implementation
of pairwise distances. It assumes inputs and outputs are tensors and
automatically concatenate the outputs of every iteration into single
tensors. The previous example only has one but it could have several.
### Loop
Operator {ref}`l-onnx-doc-Loop` implements a for and a while loop. It can do a fixed
number of iterators and/or ends when a condition is not met anymore.
Outputs are processed in two different ways. First one is similar to
loop {ref}`l-onnx-doc-Scan`, outputs are concatenated into tensors (along the first
dimension). This also means that these outputs must have compatible shapes.
Second mechanism concatenates tensors into a sequence of tensors.
(l-onnx-extensibility)=
## Extensibility
ONNX defines a list of operators as the standard: {ref}`l-onnx-operators`.
However, it is very possible
to define your own operators under this domain or a new one.
*onnxruntime* defines custom operators to improve inference.
Every node has a type, a name,
named inputs and outputs, and attributes. As long as a node is described
under these constraints, a node can be added to any ONNX graph.
Pairwise distances can be implemented with operator Scan.
However, a dedicated operator called CDist is proved significantly
faster, significantly enough to make the effort to implement a dedicated runtime
for it.
## Functions
Functions are one way to extend ONNX specifications. Some model requires
the same combination of operators. This can be avoided by creating a function
itself defined with existing ONNX operators. Once defined, a function behaves
like any other operators. It has inputs, outputs and attributes.
There are two advantages of using functions. The first one is to have a
shorter code and easier to read. The second one is that any onnxruntime
can leverage that information to run predictions faster. The runtime
could have a specific implementation for a function not relying on the
implementation of the existing operators.
## Shape (and Type) Inference
Knowing the shapes of results is not necessary to execute an ONNX graph
but this information can be used to make it faster. If you have the following
graph:
```
Add(x, y) -> z
Abs(z) -> w
```
If *x* and *y* have the same shape, then *z* and *w* also have the same
shape. Knowing that, it is possible to reuse the buffer allocated for *z*,
to compute the absolute value *w* inplace. Shape inference helps the
runtime to manage the memory and therefore to be more efficient.
ONNX package can compute in most of the cases the output shape
knowing the input shape for every standard operator. It cannot
obviously do that for any custom operator outside of the official
list.
## Tools
[netron](https://netron.app/)
is very useful to help visualize ONNX graphs.
That's the only one without programming. The first screenshot was
made with this tool.
```{image} images/linreg1.png
```
[onnx2py.py](https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/onnx2py.py)
creates a python file from an ONNX graph. This script can create
the same graph. It may be modified by a user to change the graph.
[zetane](https://github.com/zetane/viewer)
can load onnx model and show intermediate results
when the model is executed.
|