File: OpDSL.md

package info (click to toggle)
llvm-toolchain-14 1%3A14.0.6-12
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,496,180 kB
  • sloc: cpp: 5,593,972; ansic: 986,872; asm: 585,869; python: 184,223; objc: 72,530; lisp: 31,119; f90: 27,793; javascript: 9,780; pascal: 9,762; sh: 9,482; perl: 7,468; ml: 5,432; awk: 3,523; makefile: 2,538; xml: 953; cs: 573; fortran: 567
file content (253 lines) | stat: -rw-r--r-- 10,386 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
# Linalg OpDSL

Python based DSL for authoring Linalg op definitions and generating
`linalg.generic` IR based on them for samples.

The Linalg OpDSL is a high level DSL for constructing structured op definitions
in a way that can be exported to built-in, named structured ops via
[YAML-based definitions](_index.md/#yaml-gen) or used interactively to emit
corresponding `linalg.generic` IR for the composition.

## Basic usage

The tool is bundled with the MLIR Python bindings. To use from the CMake build
tree, MLIR must be build with Python bindings enabled
(`-DMLIR_ENALBE_BINDINGS_PYTHON=ON`). Then add the `python` directory in the
build tree to your `PYTHONPATH` environment variable (i.e. `export
PYTHONPATH=$PWD/build/tools/mlir/python_packages/mlir_core`). Optionally, use an
installed MLIR package, if available, to avoid building.

```shell
# Dump the `core_named_ops.py` module as YAML.
python -m mlir.dialects.linalg.opdsl.dump_oplib .ops.core_named_ops
```

Alternatively, run the `$PWD/build/bin/update_core_linalg_named_ops.sh` script,
which is available after building the `mlir-linalg-ods-gen` target. The tool is
meant for use during both development and runtime, but not as a build tool of
the core compiler: in order to export static named op definitions to be built as
part of the compiler, the corresponding Linalg dialect YAML file must be updated
and reviewed. TODO: Develop a script to automate op updates to these files.

## Language Guide

The language presented here is loosely inspired from the
[Tensor Comprehensions](https://arxiv.org/pdf/1802.04730.pdf) work, adapted to
represent linalg structured ops.

This tool is new and rapidly evolving. For language examples, refer to the
built-in ops in the `mlir.tools.linalg_opdsl.ops` package
(`lib/Bindings/Python/mlir/tools/linalg_opdsl/ops` in the repository).

Using a matmul as an example, we will decompose the language:

```python
T1 = TV.T1
T2 = TV.T2

@linalg_structured_op
def matmul(A=TensorDef(T1, S.M, S.K),
           B=TensorDef(T2, S.K, S.N),
           C=TensorDef(U, S.M, S.N, output=True)):
  """Performs a matrix multiplication of two 2D inputs.

  Numeric casting is performed on the operands to the inner multiply, promoting
  them to the same data type as the accumulator/output.
  """
  domain(D.m, D.n, D.k)
  implements(ContractionOpInterface)
  C[D.m, D.n] += TypeFn.cast(U, A[D.m, D.k]) * TypeFn.cast(U, B[D.k, D.n])
```

Here we have a simple type polymorphic contraction that takes arguments `A` and
`B` and outputs `C`. Each is bound to a `TensorDef`, which specifies:

*   The symbolic element type (`T1`, `T2`, `U` above).
*   Symbolic shape expressions with symbols that are bound globally for the op (
    note that in this simple example, the shape expressions are just symbol
    references, but they are permitted to be a constrained set of affine
    expressions).
*   Usage (`output=True`).

The docstring will be transferred to the op definition verbatim.

An explicit iteration domain dimension order can be declared for the op via
`domain(D.d0[, D.d1...])`.

Special identifying op interfaces can be declared for the op via
`implements(interface1[, interface2...])`.

## Parameters

Structured operations take two types of runtime parameters namely scalars and
tensors. While scalars are inputs only, a tensor may be marked as an output.
Assignment expressions index the tensor parameters to access the individual
elements, while scalars can be accessed directly.

The following example demonstrates the use of the two parameter types:

```python
@linalg_structured_op
def copy_and_scale(val=ScalarDef(T),
                   I=TensorDef(T, S.M, S.K),
                   O=TensorDef(T, S.M, S.K, output=True)):
  """Scale the input by the scalar value and store the result"""
  O[D.m, D.n] = I[D.m, D.n] * val
```

The operation scales the input tensor `I` scales its elements by the value `val`
and writes the result to the output tensor `out`. The scalar `val` is bound to a
`ScalarDef`, which specifies the type of the scalar operand. The tensors are
bound to a `TensorDef` as demonstrated by the matmul example. All parameters
appear in the parameter list of the operation:

```python
fill(val, in_tensor, outs=[out_tensor])
```

## Attributes

Attributes are compile-time constant parameters only accessible in index
expressions. They can be used to parameterize the access pattern of a structured
operation, for example, by setting its strides. They cannot take part in the
actual computation.

The following example demonstrates the use of attributes:

```python
@linalg_structured_op
def strided_copy(I=TensorDef(T, S.IH, S.IW),
                 O=TensorDef(T, S.OH, S.OW, output=True),
                 strides=IndexAttrDef(S.SH, S.SW)):
  """Copy a subset of the input tensor elements to the output tensor"""
  O[D.oh, D.ow] = I[D.oh * S.SH, D.ow * S.SW]
```

The operation implements a strided copy from the input tensor `I` to the output
tensor `O`. The `strides` attribute is bound to an `IndexAttrDef`. It defines
the symbols `S.SH` and `S.SW`, which are used to index the input tensor `I`.
When instantiating the operation, the attribute is set using a named argument:

```python
strided_copy(in_tensor, outs=[out_tensor], strides=[1,2])
```

The `strides` vector elements substitute the symbols `S.SH` and `S.SW` in the
index expressions of the operation instance.

Attributes are currently limited to integer vectors and only accessible in index
expressions. An operation may have multiple attributes all of them placed at the
end of the parameter list after the output tensors.

## Shape-Only Tensors

Structured operations derive the iteration space given the sizes of the input
and output tensors. Certain operations need shape-only tensors that are not
accessed and exist purely for the sake of specifying the iteration domain. An
example is the pooling operation that takes a shape-only tensor to define the
iteration space of the reduction. As shape-only tensors have no uses, the
`TensorDef` takes an additional optional `index_dims` parameter to map the shape
to index dimensions.

The following example demonstrates the index dimension annotation:

```python
@linalg_structured_op
def pooling_poly(
    I=TensorDef(T1, S.N, S.H, S.W, S.C),
    K=TensorDef(T2, S.KH, S.KW, index_dims=[D.kh, D.kw]),
    O=TensorDef(U, S.N, S.OH, S.OW, S.C, output=True),
    strides=IndexAttrDef(S.SH, S.SW),
    dilations=IndexAttrDef(S.DH, S.DW)):
  O[D.n, D.oh, D.ow, D.c] += TypeFn.cast(U,
          I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.c])
```

The pooling operation does not access the shape-only tensor `K`. Instead, the
shapes `S.KH` and `S.KW` specify the iteration domain for the reduction
dimensions `D.kh` and `D.kw`.

## Assignments

The bulk of language consists of assignment expressions of the form above. The
iteration dimension order is determined lexically based on the order encountered
in the expression (following operator precedence if math operators are used).
TODO: Introduce a directive to fix the dimension bindings.

Reduction dimensions are inferred to be any dimensions on the RHS that are not
on the LHS.

A number of arithmetic functions are supported:

*   `ArithFn.add(a, b)` (also via overloading the binary `+` operator)
*   `ArithFn.exp(a)`
*   `ArithFn.log(a)`
*   `ArithFn.mul(a, b)` (also via overloading the binary `*` operator)
*   `ArithFn.max(a, b)`
*   `ArithFn.min(a, b)`
*   `ArithFn.sub(a, b)` (also via overloading the binary `-` operator)
*   `ArithFn.max_unsigned(a, b)`
*   `ArithFn.min_unsigned(a, b)`

As the integer types are signless, signedness is implement by different
functions that treat integers as signed or unsigned values.

A subset of the arithmetic functions are supported in reductions. These
reduction functions can appear as the outermost function on the RHS:

*   `ReduceFn.add` (also overloading the inplace `+=` on a LHS)
*   `ReduceFn.mul`
*   `ReduceFn.max`
*   `ReduceFn.min`
*   `ReduceFn.max_unsigned`
*   `ReduceFn.min_unsigned`

As the integer types are signless, signedness is implement by different
functions that treat integers as signed or unsigned values.

Additionally, type conversion functions cast an operand to a target type:

*   `TypeFn.cast(TypeVar, operand)`
*   `TypeFn.cast_unsigned(TypeVar, operand)`

As the integer types are signless, signedness is implement by different
functions that treat integers as signed (`TypeFn.cast`) or unsigned
(`TypeFn.cast_unsigned`) values.

There are also special forms:

*   `const(value)` returns a constant value.
*   `index(dim)` returns the iteration index in the given dimension `dim`.

## Types

All types in assignment expressions are late bound based on actual input and
output types of constructed ops. An exception are predefined types such as
`I32`, `I64`, `F32`, and `F64`. These hardwired types enable intermediate
computations with a type that is independent of the input and output types. For
example, parts of floating point computation may require double precision
arithmetic despite all inputs and outputs being single precision values.
Assignment expressions with no `TypeFn.cast` calls will generally require
uniform types throughout and will fail to verify if violated. The presence of a
`TypeFn.cast` or `TypeFn.cast_unsigned` allows for a limited form of numeric
type conversion between element types that can be derived from inputs and
outputs (and in the future, attributes). `TypeFn.cast` calls with a `TypeVar`
first argument are emitted as `type_fn` primitives in the YAML definition.

Casting will perform `int<->float` and `index->int` type conversions and will
perform any necessary extension or truncation within the type family. The
integer types themselves are signless and signedness is implemented by
functions/operations. The `TypeFn.cast` function treats all integers as signed,
while `TypeFn.cast_unsigned` treats them as unsigned.

The following examples illustrate the lowering of signed and unsigned functions:

*   cast(I32 -> I64) -> `arith.ExtSIOp`
*   cast(F32 -> I32) -> `arith.FPToSIOp`
*   cast_unsigned(I32 -> I64) -> `arith.ExtUIOp`
*   cast_unsigned(F32 -> I32) -> `arith.FPToUIOp`
*   max -> `arith.MaxSIOp`
*   max_unsinged -> `arith.MaxUIOp`

Not all functions are applicable for all numeric types, and on mismatch, op
verification will fail.