File: Python.md

package info (click to toggle)
swiftlang 6.0.3-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,519,992 kB
  • sloc: cpp: 9,107,863; ansic: 2,040,022; asm: 1,135,751; python: 296,500; objc: 82,456; f90: 60,502; lisp: 34,951; pascal: 19,946; sh: 18,133; perl: 7,482; ml: 4,937; javascript: 4,117; makefile: 3,840; awk: 3,535; xml: 914; fortran: 619; cs: 573; ruby: 573
file content (1195 lines) | stat: -rw-r--r-- 49,604 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
# MLIR Python Bindings

**Current status**: Under development and not enabled by default

[TOC]

## Building

### Pre-requisites

*   A relatively recent Python3 installation
*   Installation of python dependencies as specified in
    `mlir/python/requirements.txt`

### CMake variables

*   **`MLIR_ENABLE_BINDINGS_PYTHON`**`:BOOL`

    Enables building the Python bindings. Defaults to `OFF`.

*   **`Python3_EXECUTABLE`**:`STRING`

    Specifies the `python` executable used for the LLVM build, including for
    determining header/link flags for the Python bindings. On systems with
    multiple Python implementations, setting this explicitly to the preferred
    `python3` executable is strongly recommended.

### Recommended development practices

It is recommended to use a python virtual environment. Many ways exist for this,
but the following is the simplest:

```shell
# Make sure your 'python' is what you expect. Note that on multi-python
# systems, this may have a version suffix, and on many Linuxes and MacOS where
# python2 and python3 co-exist, you may also want to use `python3`.
which python
python -m venv ~/.venv/mlirdev
source ~/.venv/mlirdev/bin/activate

# Note that many LTS distros will bundle a version of pip itself that is too
# old to download all of the latest binaries for certain platforms.
# The pip version can be obtained with `python -m pip --version`, and for
# Linux specifically, this should be cross checked with minimum versions
# here: https://github.com/pypa/manylinux
# It is recommended to upgrade pip:
python -m pip install --upgrade pip


# Now the `python` command will resolve to your virtual environment and
# packages will be installed there.
python -m pip install -r mlir/python/requirements.txt

# Now run `cmake`, `ninja`, et al.
```

For interactive use, it is sufficient to add the
`tools/mlir/python_packages/mlir_core/` directory in your `build/` directory to
the `PYTHONPATH`. Typically:

```shell
export PYTHONPATH=$(cd build && pwd)/tools/mlir/python_packages/mlir_core
```

Note that if you have installed (i.e. via `ninja install`, et al), then python
packages for all enabled projects will be in your install tree under
`python_packages/` (i.e. `python_packages/mlir_core`). Official distributions
are built with a more specialized setup.

## Design

### Use cases

There are likely two primary use cases for the MLIR python bindings:

1.  Support users who expect that an installed version of LLVM/MLIR will yield
    the ability to `import mlir` and use the API in a pure way out of the box.

1.  Downstream integrations will likely want to include parts of the API in
    their private namespace or specially built libraries, probably mixing it
    with other python native bits.

### Composable modules

In order to support use case \#2, the Python bindings are organized into
composable modules that downstream integrators can include and re-export into
their own namespace if desired. This forces several design points:

*   Separate the construction/populating of a `py::module` from
    `PYBIND11_MODULE` global constructor.

*   Introduce headers for C++-only wrapper classes as other related C++ modules
    will need to interop with it.

*   Separate any initialization routines that depend on optional components into
    its own module/dependency (currently, things like `registerAllDialects` fall
    into this category).

There are a lot of co-related issues of shared library linkage, distribution
concerns, etc that affect such things. Organizing the code into composable
modules (versus a monolithic `cpp` file) allows the flexibility to address many
of these as needed over time. Also, compilation time for all of the template
meta-programming in pybind scales with the number of things you define in a
translation unit. Breaking into multiple translation units can significantly aid
compile times for APIs with a large surface area.

### Submodules

Generally, the C++ codebase namespaces most things into the `mlir` namespace.
However, in order to modularize and make the Python bindings easier to
understand, sub-packages are defined that map roughly to the directory structure
of functional units in MLIR.

Examples:

*   `mlir.ir`
*   `mlir.passes` (`pass` is a reserved word :( )
*   `mlir.dialect`
*   `mlir.execution_engine` (aside from namespacing, it is important that
    "bulky"/optional parts like this are isolated)

In addition, initialization functions that imply optional dependencies should be
in underscored (notionally private) modules such as `_init` and linked
separately. This allows downstream integrators to completely customize what is
included "in the box" and covers things like dialect registration, pass
registration, etc.

### Loader

LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
other non-trivial native extensions. As such, the native extension (i.e. the
`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
(`_mlir`), while a small set of Python code is provided in
`mlir/_cext_loader.py` and siblings which loads and re-exports it. This split
provides a place to stage code that needs to prepare the environment *before*
the shared library is loaded into the Python runtime, and also provides a place
that one-time initialization code can be invoked apart from module constructors.

It is recommended to avoid using `__init__.py` files to the extent possible,
until reaching a leaf package that represents a discrete component. The rule to
keep in mind is that the presence of an `__init__.py` file prevents the ability
to split anything at that level or below in the namespace into different
directories, deployment packages, wheels, etc.

See the documentation for more information and advice:
https://packaging.python.org/guides/packaging-namespace-packages/

### Use the C-API

The Python APIs should seek to layer on top of the C-API to the degree possible.
Especially for the core, dialect-independent parts, such a binding enables
packaging decisions that would be difficult or impossible if spanning a C++ ABI
boundary. In addition, factoring in this way side-steps some very difficult
issues that arise when combining RTTI-based modules (which pybind derived things
are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).

### Ownership in the Core IR

There are several top-level types in the core IR that are strongly owned by
their python-side reference:

*   `PyContext` (`mlir.ir.Context`)
*   `PyModule` (`mlir.ir.Module`)
*   `PyOperation` (`mlir.ir.Operation`) - but with caveats

All other objects are dependent. All objects maintain a back-reference
(keep-alive) to their closest containing top-level object. Further, dependent
objects fall into two categories: a) uniqued (which live for the life-time of
the context) and b) mutable. Mutable objects need additional machinery for
keeping track of when the C++ instance that backs their Python object is no
longer valid (typically due to some specific mutation of the IR, deletion, or
bulk operation).

### Optionality and argument ordering in the Core IR

The following types support being bound to the current thread as a context
manager:

*   `PyLocation` (`loc: mlir.ir.Location = None`)
*   `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`)
*   `PyMlirContext` (`context: mlir.ir.Context = None`)

In order to support composability of function arguments, when these types appear
as arguments, they should always be the last and appear in the above order and
with the given names (which is generally the order in which they are expected to
need to be expressed explicitly in special cases) as necessary. Each should
carry a default value of `py::none()` and use either a manual or automatic
conversion for resolving either with the explicit value or a value from the
thread context manager (i.e. `DefaultingPyMlirContext` or
`DefaultingPyLocation`).

The rationale for this is that in Python, trailing keyword arguments to the
*right* are the most composable, enabling a variety of strategies such as kwarg
passthrough, default values, etc. Keeping function signatures composable
increases the chances that interesting DSLs and higher level APIs can be
constructed without a lot of exotic boilerplate.

Used consistently, this enables a style of IR construction that rarely needs to
use explicit contexts, locations, or insertion points but is free to do so when
extra control is needed.

#### Operation hierarchy

As mentioned above, `PyOperation` is special because it can exist in either a
top-level or dependent state. The life-cycle is unidirectional: operations can
be created detached (top-level) and once added to another operation, they are
then dependent for the remainder of their lifetime. The situation is more
complicated when considering construction scenarios where an operation is added
to a transitive parent that is still detached, necessitating further accounting
at such transition points (i.e. all such added children are initially added to
the IR with a parent of their outer-most detached operation, but then once it is
added to an attached operation, they need to be re-parented to the containing
module).

Due to the validity and parenting accounting needs, `PyOperation` is the owner
for regions and blocks and needs to be a top-level type that we can count on not
aliasing. This let's us do things like selectively invalidating instances when
mutations occur without worrying that there is some alias to the same operation
in the hierarchy. Operations are also the only entity that are allowed to be in
a detached state, and they are interned at the context level so that there is
never more than one Python `mlir.ir.Operation` object for a unique
`MlirOperation`, regardless of how it is obtained.

The C/C++ API allows for Region/Block to also be detached, but it simplifies the
ownership model a lot to eliminate that possibility in this API, allowing the
Region/Block to be completely dependent on its owning operation for accounting.
The aliasing of Python `Region`/`Block` instances to underlying
`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned
in the context (unlike operations).

If we ever want to re-introduce detached regions/blocks, we could do so with new
"DetachedRegion" class or similar and also avoid the complexity of accounting.
With the way it is now, we can avoid having a global live list for regions and
blocks. We may end up needing an op-local one at some point TBD, depending on
how hard it is to guarantee how mutations interact with their Python peer
objects. We can cross that bridge easily when we get there.

Module, when used purely from the Python API, can't alias anyway, so we can use
it as a top-level ref type without a live-list for interning. If the API ever
changes such that this cannot be guaranteed (i.e. by letting you marshal a
native-defined Module in), then there would need to be a live table for it too.

## User-level API

### Context Management

The bindings rely on Python
[context managers](https://docs.python.org/3/reference/datamodel.html#context-managers)
(`with` statements) to simplify creation and handling of IR objects by omitting
repeated arguments such as MLIR contexts, operation insertion points and
locations. A context manager sets up the default object to be used by all
binding calls within the following context and in the same thread. This default
can be overridden by specific calls through the dedicated keyword arguments.

#### MLIR Context

An MLIR context is a top-level entity that owns attributes and types and is
referenced from virtually all IR constructs. Contexts also provide thread safety
at the C++ level. In Python bindings, the MLIR context is also a Python context
manager, one can write:

```python
from mlir.ir import Context, Module

with Context() as ctx:
  # IR construction using `ctx` as context.

  # For example, parsing an MLIR module from string requires the context.
  Module.parse("builtin.module {}")
```

IR objects referencing a context usually provide access to it through the
`.context` property. Most IR-constructing functions expect the context to be
provided in some form. In case of attributes and types, the context may be
extracted from the contained attribute or type. In case of operations, the
context is systematically extracted from Locations (see below). When the context
cannot be extracted from any argument, the bindings API expects the (keyword)
argument `context`. If it is not provided or set to `None` (default), it will be
looked up from an implicit stack of contexts maintained by the bindings in the
current thread and updated by context managers. If there is no surrounding
context, an error will be raised.

Note that it is possible to manually specify the MLIR context both inside and
outside of the `with` statement:

```python
from mlir.ir import Context, Module

standalone_ctx = Context()
with Context() as managed_ctx:
  # Parse a module in managed_ctx.
  Module.parse("...")

  # Parse a module in standalone_ctx (override the context manager).
  Module.parse("...", context=standalone_ctx)

# Parse a module without using context managers.
Module.parse("...", context=standalone_ctx)
```

The context object remains live as long as there are IR objects referencing it.

#### Insertion Points and Locations

When constructing an MLIR operation, two pieces of information are required:

-   an *insertion point* that indicates where the operation is to be created in
    the IR region/block/operation structure (usually before or after another
    operation, or at the end of some block); it may be missing, at which point
    the operation is created in the *detached* state;
-   a *location* that contains user-understandable information about the source
    of the operation (for example, file/line/column information), which must
    always be provided as it carries a reference to the MLIR context.

Both can be provided using context managers or explicitly as keyword arguments
in the operation constructor. They can be also provided as keyword arguments
`ip` and `loc` both within and outside of the context manager.

```python
from mlir.ir import Context, InsertionPoint, Location, Module, Operation

with Context() as ctx:
  module = Module.create()

  # Prepare for inserting operations into the body of the module and indicate
  # that these operations originate in the "f.mlir" file at the given line and
  # column.
  with InsertionPoint(module.body), Location.file("f.mlir", line=42, col=1):
    # This operation will be inserted at the end of the module body and will
    # have the location set up by the context manager.
    Operation(<...>)

    # This operation will be inserted at the end of the module (and after the
    # previously constructed operation) and will have the location provided as
    # the keyword argument.
    Operation(<...>, loc=Location.file("g.mlir", line=1, col=10))

    # This operation will be inserted at the *beginning* of the block rather
    # than at its end.
    Operation(<...>, ip=InsertionPoint.at_block_begin(module.body))
```

Note that `Location` needs an MLIR context to be constructed. It can take the
context set up in the current thread by some surrounding context manager, or
accept it as an explicit argument:

```python
from mlir.ir import Context, Location

# Create a context and a location in this context in the same `with` statement.
with Context() as ctx, Location.file("f.mlir", line=42, col=1, context=ctx):
  pass
```

Locations are owned by the context and live as long as they are (transitively)
referenced from somewhere in Python code.

Unlike locations, the insertion point may be left unspecified (or, equivalently,
set to `None` or `False`) during operation construction. In this case, the
operation is created in the *detached* state, that is, it is not added into the
region of another operation and is owned by the caller. This is usually the case
for top-level operations that contain the IR, such as modules. Regions, blocks
and values contained in an operation point back to it and maintain it live.

### Inspecting IR Objects

Inspecting the IR is one of the primary tasks the Python bindings are designed
for. One can traverse the IR operation/region/block structure and inspect their
aspects such as operation attributes and value types.

#### Operations, Regions and Blocks

Operations are represented as either:

-   the generic `Operation` class, useful in particular for generic processing
    of unregistered operations; or
-   a specific subclass of `OpView` that provides more semantically-loaded
    accessors to operation properties.

Given an `OpView` subclass, one can obtain an `Operation` using its `.operation`
property. Given an `Operation`, one can obtain the corresponding `OpView` using
its `.opview` property *as long as* the corresponding class has been set up.
This typically means that the Python module of its dialect has been loaded. By
default, the `OpView` version is produced when navigating the IR tree.

One can check if an operation has a specific type by means of Python's
`isinstance` function:

```python
operation = <...>
opview = <...>
if isinstance(operation.opview, mydialect.MyOp):
  pass
if isinstance(opview, mydialect.MyOp):
  pass
```

The components of an operation can be inspected using its properties.

-   `attributes` is a collection of operation attributes . It can be subscripted
    as both dictionary and sequence, e.g., both `operation.attributes["value"]`
    and `operation.attributes[0]` will work. There is no guarantee on the order
    in which the attributes are traversed when iterating over the `attributes`
    property as sequence.
-   `operands` is a sequence collection of operation operands.
-   `results` is a sequence collection of operation results.
-   `regions` is a sequence collection of regions attached to the operation.

The objects produced by `operands` and `results` have a `.types` property that
contains a sequence collection of types of the corresponding values.

```python
from mlir.ir import Operation

operation1 = <...>
operation2 = <...>
if operation1.results.types == operation2.operand.types:
  pass
```

`OpView` subclasses for specific operations may provide leaner accessors to
properties of an operation. For example, named attributes, operand and results
are usually accessible as properties of the `OpView` subclass with the same
name, such as `operation.const_value` instead of
`operation.attributes["const_value"]`. If this name is a reserved Python
keyword, it is suffixed with an underscore.

The operation itself is iterable, which provides access to the attached regions
in order:

```python
from mlir.ir import Operation

operation = <...>
for region in operation:
  do_something_with_region(region)
```

A region is conceptually a sequence of blocks. Objects of the `Region` class are
thus iterable, which provides access to the blocks. One can also use the
`.blocks` property.

```python
# Regions are directly iterable and give access to blocks.
for block1, block2 in zip(operation.regions[0], operation.regions[0].blocks)
  assert block1 == block2
```

A block contains a sequence of operations, and has several additional
properties. Objects of the `Block` class are iterable and provide access to the
operations contained in the block. So does the `.operations` property. Blocks
also have a list of arguments available as a sequence collection using the
`.arguments` property.

Block and region belong to the parent operation in Python bindings and keep it
alive. This operation can be accessed using the `.owner` property.

#### Attributes and Types

Attributes and types are (mostly) immutable context-owned objects. They are
represented as either:

-   an opaque `Attribute` or `Type` object supporting printing and comparison;
    or
-   a concrete subclass thereof with access to properties of the attribute or
    type.

Given an `Attribute` or `Type` object, one can obtain a concrete subclass using
the constructor of the subclass. This may raise a `ValueError` if the attribute
or type is not of the expected subclass:

```python
from mlir.ir import Attribute, Type
from mlir.<dialect> import ConcreteAttr, ConcreteType

attribute = <...>
type = <...>
try:
  concrete_attr = ConcreteAttr(attribute)
  concrete_type = ConcreteType(type)
except ValueError as e:
  # Handle incorrect subclass.
```

In addition, concrete attribute and type classes provide a static `isinstance`
method to check whether an object of the opaque `Attribute` or `Type` type can
be downcasted:

```python
from mlir.ir import Attribute, Type
from mlir.<dialect> import ConcreteAttr, ConcreteType

attribute = <...>
type = <...>

# No need to handle errors here.
if ConcreteAttr.isinstance(attribute):
  concrete_attr = ConcreteAttr(attribute)
if ConcreteType.isinstance(type):
  concrete_type = ConcreteType(type)
```

By default, and unlike operations, attributes and types are returned from IR
traversals using the opaque `Attribute` or `Type` that needs to be downcasted.

Concrete attribute and type classes usually expose their properties as Python
readonly properties. For example, the elemental type of a tensor type can be
accessed using the `.element_type` property.

#### Values

MLIR has two kinds of values based on their defining object: block arguments and
operation results. Values are handled similarly to attributes and types. They
are represented as either:

-   a generic `Value` object; or
-   a concrete `BlockArgument` or `OpResult` object.

The former provides all the generic functionality such as comparison, type
access and printing. The latter provide access to the defining block or
operation and the position of the value within it. By default, the generic
`Value` objects are returned from IR traversals. Downcasting is implemented
through concrete subclass constructors, similarly to attribtues and types:

```python
from mlir.ir import BlockArgument, OpResult, Value

value = ...

# Set `concrete` to the specific value subclass.
try:
  concrete = BlockArgument(value)
except ValueError:
  # This must not raise another ValueError as values are either block arguments
  # or op results.
  concrete = OpResult(value)
```

#### Interfaces

MLIR interfaces are a mechanism to interact with the IR without needing to know
specific types of operations but only some of their aspects. Operation
interfaces are available as Python classes with the same name as their C++
counterparts. Objects of these classes can be constructed from either:

-   an object of the `Operation` class or of any `OpView` subclass; in this
    case, all interface methods are available;
-   a subclass of `OpView` and a context; in this case, only the *static*
    interface methods are available as there is no associated operation.

In both cases, construction of the interface raises a `ValueError` if the
operation class does not implement the interface in the given context (or, for
operations, in the context that the operation is defined in). Similarly to
attributes and types, the MLIR context may be set up by a surrounding context
manager.

```python
from mlir.ir import Context, InferTypeOpInterface

with Context():
  op = <...>

  # Attempt to cast the operation into an interface.
  try:
    iface = InferTypeOpInterface(op)
  except ValueError:
    print("Operation does not implement InferTypeOpInterface.")
    raise

  # All methods are available on interface objects constructed from an Operation
  # or an OpView.
  iface.someInstanceMethod()

  # An interface object can also be constructed given an OpView subclass. It
  # also needs a context in which the interface will be looked up. The context
  # can be provided explicitly or set up by the surrounding context manager.
  try:
    iface = InferTypeOpInterface(some_dialect.SomeOp)
  except ValueError:
    print("SomeOp does not implement InferTypeOpInterface.")
    raise

  # Calling an instance method on an interface object constructed from a class
  # will raise TypeError.
  try:
    iface.someInstanceMethod()
  except TypeError:
    pass

  # One can still call static interface methods though.
  iface.inferOpReturnTypes(<...>)
```

If an interface object was constructed from an `Operation` or an `OpView`, they
are available as `.operation` and `.opview` properties of the interface object,
respectively.

Only a subset of operation interfaces are currently provided in Python bindings.
Attribute and type interfaces are not yet available in Python bindings.

### Creating IR Objects

Python bindings also support IR creation and manipulation.

#### Operations, Regions and Blocks

Operations can be created given a `Location` and an optional `InsertionPoint`.
It is often easier to user context managers to specify locations and insertion
points for several operations created in a row as described above.

Concrete operations can be created by using constructors of the corresponding
`OpView` subclasses. The generic, default form of the constructor accepts:

-   an optional sequence of types for operation results (`results`);
-   an optional sequence of values for operation operands, or another operation
    producing those values (`operands`);
-   an optional dictionary of operation attributes (`attributes`);
-   an optional sequence of successor blocks (`successors`);
-   the number of regions to attach to the operation (`regions`, default `0`);
-   the `loc` keyword argument containing the `Location` of this operation; if
    `None`, the location created by the closest context manager is used or an
    exception will be raised if there is no context manager;
-   the `ip` keyword argument indicating where the operation will be inserted in
    the IR; if `None`, the insertion point created by the closest context
    manager is used; if there is no surrounding context manager, the operation
    is created in the detached state.

Most operations will customize the constructor to accept a reduced list of
arguments that are relevant for the operation. For example, zero-result
operations may omit the `results` argument, so can the operations where the
result types can be derived from operand types unambiguously. As a concrete
example, built-in function operations can be constructed by providing a function
name as string and its argument and result types as a tuple of sequences:

```python
from mlir.ir import Context, Module
from mlir.dialects import builtin

with Context():
  module = Module.create()
  with InsertionPoint(module.body), Location.unknown():
    func = func.FuncOp("main", ([], []))
```

Also see below for constructors generated from ODS.

Operations can also be constructed using the generic class and based on the
canonical string name of the operation using `Operation.create`. It accepts the
operation name as string, which must exactly match the canonical name of the
operation in C++ or ODS, followed by the same argument list as the default
constructor for `OpView`. *This form is discouraged* from use and is intended
for generic operation processing.

```python
from mlir.ir import Context, Module
from mlir.dialects import builtin

with Context():
  module = Module.create()
  with InsertionPoint(module.body), Location.unknown():
    # Operations can be created in a generic way.
    func = Operation.create(
        "func.func", results=[], operands=[],
        attributes={"function_type":TypeAttr.get(FunctionType.get([], []))},
        successors=None, regions=1)
    # The result will be downcasted to the concrete `OpView` subclass if
    # available.
    assert isinstance(func, func.FuncOp)
```

Regions are created for an operation when constructing it on the C++ side. They
are not constructible in Python and are not expected to exist outside of
operations (unlike in C++ that supports detached regions).

Blocks can be created within a given region and inserted before or after another
block of the same region using `create_before()`, `create_after()` methods of
the `Block` class, or the `create_at_start()` static method of the same class.
They are not expected to exist outside of regions (unlike in C++ that supports
detached blocks).

```python
from mlir.ir import Block, Context, Operation

with Context():
  op = Operation.create("generic.op", regions=1)

  # Create the first block in the region.
  entry_block = Block.create_at_start(op.regions[0])

  # Create further blocks.
  other_block = entry_block.create_after()
```

Blocks can be used to create `InsertionPoint`s, which can point to the beginning
or the end of the block, or just before its terminator. It is common for
`OpView` subclasses to provide a `.body` property that can be used to construct
an `InsertionPoint`. For example, builtin `Module` and `FuncOp` provide a
`.body` and `.add_entry_blocK()`, respectively.

#### Attributes and Types

Attributes and types can be created given a `Context` or another attribute or
type object that already references the context. To indicate that they are owned
by the context, they are obtained by calling the static `get` method on the
concrete attribute or type class. These method take as arguments the data
necessary to construct the attribute or type and a the keyword `context`
argument when the context cannot be derived from other arguments.

```python
from mlir.ir import Context, F32Type, FloatAttr

# Attribute and types require access to an MLIR context, either directly or
# through another context-owned object.
ctx = Context()
f32 = F32Type.get(context=ctx)
pi = FloatAttr.get(f32, 3.14)

# They may use the context defined by the surrounding context manager.
with Context():
  f32 = F32Type.get()
  pi = FloatAttr.get(f32, 3.14)
```

Some attributes provide additional construction methods for clarity.

```python
from mlir.ir import Context, IntegerAttr, IntegerType

with Context():
  i8 = IntegerType.get_signless(8)
  IntegerAttr.get(i8, 42)
```

Builtin attribute can often be constructed from Python types with similar
structure. For example, `ArrayAttr` can be constructed from a sequence
collection of attributes, and a `DictAttr` can be constructed from a dictionary:

```python
from mlir.ir import ArrayAttr, Context, DictAttr, UnitAttr

with Context():
  array = ArrayAttr.get([UnitAttr.get(), UnitAttr.get()])
  dictionary = DictAttr.get({"array": array, "unit": UnitAttr.get()})
```

Custom builders for Attributes to be used during Operation creation can be
registered by way of the `register_attribute_builder`. In particular the
following is how a custom builder is registered for `I32Attr`:

```python
@register_attribute_builder("I32Attr")
def _i32Attr(x: int, context: Context):
  return IntegerAttr.get(
        IntegerType.get_signless(32, context=context), x)
```

This allows to invoke op creation of an op with a `I32Attr` with

```python
foo.Op(30)
```

The registration is based on the ODS name but registry is via pure python
method. Only single custom builder is allowed to be registered per ODS attribute
type (e.g., I32Attr can have only one, which can correspond to multiple of the
underlying IntegerAttr type).

instead of

```python
foo.Op(IntegerAttr.get(IndexType.get_signless(32, context=context), 30))
```

## Style

In general, for the core parts of MLIR, the Python bindings should be largely
isomorphic with the underlying C++ structures. However, concessions are made
either for practicality or to give the resulting library an appropriately
"Pythonic" flavor.

### Properties vs get\*() methods

Generally favor converting trivial methods like `getContext()`, `getName()`,
`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
primarily a matter of calling `def_property_readonly` vs `def` in binding code,
and makes things feel much nicer to the Python side.

For example, prefer:

```c++
m.def_property_readonly("context", ...)
```

Over:

```c++
m.def("getContext", ...)
```

### **repr** methods

Things that have nice printed representations are really great :) If there is a
reasonable printed form, it can be a significant productivity boost to wire that
to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).

### CamelCase vs snake\_case

Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
a mechanical concession to Python style, this can go a long way to making the
API feel like it fits in with its peers in the Python landscape.

If in doubt, choose names that will flow properly with other
[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).

### Prefer pseudo-containers

Many core IR constructs provide methods directly on the instance to query count
and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.

For example, a direct mapping of blocks within regions could be done this way:

```python
region = ...

for block in region:

  pass
```

However, this way is preferred:

```python
region = ...

for block in region.blocks:

  pass

print(len(region.blocks))
print(region.blocks[0])
print(region.blocks[-1])
```

Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
them to appropriate `__dunder__` methods and iterator wrappers in the bindings.

Note that this can be taken too far, so use good judgment. For example, block
arguments may appear container-like but have defined methods for lookup and
mutation that would be hard to model properly without making semantics
complicated. If running into these, just mirror the C/C++ API.

### Provide one stop helpers for common things

One stop helpers that aggregate over multiple low level entities can be
incredibly helpful and are encouraged within reason. For example, making
`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
construct a SourceMgr can be quite nice. One stop helpers do not have to be
mutually exclusive with a more complete mapping of the backing constructs.

## Testing

Tests should be added in the `test/Bindings/Python` directory and should
typically be `.py` files that have a lit run line.

We use `lit` and `FileCheck` based tests:

*   For generative tests (those that produce IR), define a Python module that
    constructs/prints the IR and pipe it through `FileCheck`.
*   Parsing should be kept self-contained within the module under test by use of
    raw constants and an appropriate `parse_asm` call.
*   Any file I/O code should be staged through a tempfile vs relying on file
    artifacts/paths outside of the test module.
*   For convenience, we also test non-generative API interactions with the same
    mechanisms, printing and `CHECK`ing as needed.

### Sample FileCheck test

```python
# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck

# TODO: Move to a test utility class once any of this actually exists.
def print_module(f):
  m = f()
  print("// -----")
  print("// TEST_FUNCTION:", f.__name__)
  print(m.to_asm())
  return f

# CHECK-LABEL: TEST_FUNCTION: create_my_op
@print_module
def create_my_op():
  m = mlir.ir.Module()
  builder = m.new_op_builder()
  # CHECK: mydialect.my_operation ...
  builder.my_op()
  return m
```

## Integration with ODS

The MLIR Python bindings integrate with the tablegen-based ODS system for
providing user-friendly wrappers around MLIR dialects and operations. There are
multiple parts to this integration, outlined below. Most details have been
elided: refer to the build rules and python sources under `mlir.dialects` for
the canonical way to use this facility.

Users are responsible for providing a `{DIALECT_NAMESPACE}.py` (or an equivalent
directory with `__init__.py` file) as the entrypoint.

### Generating `_{DIALECT_NAMESPACE}_ops_gen.py` wrapper modules

Each dialect with a mapping to python requires that an appropriate
`_{DIALECT_NAMESPACE}_ops_gen.py` wrapper module is created. This is done by
invoking `mlir-tblgen` on a python-bindings specific tablegen wrapper that
includes the boilerplate and actual dialect specific `td` file. An example, for
the `Func` (which is assigned the namespace `func` as a special case):

```tablegen
#ifndef PYTHON_BINDINGS_FUNC_OPS
#define PYTHON_BINDINGS_FUNC_OPS

include "mlir/Dialect/Func/IR/FuncOps.td"

#endif // PYTHON_BINDINGS_FUNC_OPS
```

In the main repository, building the wrapper is done via the CMake function
`declare_mlir_dialect_python_bindings`, which invokes:

```
mlir-tblgen -gen-python-op-bindings -bind-dialect={DIALECT_NAMESPACE} \
    {PYTHON_BINDING_TD_FILE}
```

The generates op classes must be included in the `{DIALECT_NAMESPACE}.py` file
in a similar way that generated headers are included for C++ generated code:

```python
from ._my_dialect_ops_gen import *
```

### Extending the search path for wrapper modules

When the python bindings need to locate a wrapper module, they consult the
`dialect_search_path` and use it to find an appropriately named module. For the
main repository, this search path is hard-coded to include the `mlir.dialects`
module, which is where wrappers are emitted by the above build rule. Out of tree
dialects and add their modules to the search path by calling:

```python
mlir._cext.append_dialect_search_prefix("myproject.mlir.dialects")
```

### Wrapper module code organization

The wrapper module tablegen emitter outputs:

*   A `_Dialect` class (extending `mlir.ir.Dialect`) with a `DIALECT_NAMESPACE`
    attribute.
*   An `{OpName}` class for each operation (extending `mlir.ir.OpView`).
*   Decorators for each of the above to register with the system.

Note: In order to avoid naming conflicts, all internal names used by the wrapper
module are prefixed by `_ods_`.

Each concrete `OpView` subclass further defines several public-intended
attributes:

*   `OPERATION_NAME` attribute with the `str` fully qualified operation name
    (i.e. `math.absf`).
*   An `__init__` method for the *default builder* if one is defined or inferred
    for the operation.
*   `@property` getter for each operand or result (using an auto-generated name
    for unnamed of each).
*   `@property` getter, setter and deleter for each declared attribute.

It further emits additional private-intended attributes meant for subclassing
and customization (default cases omit these attributes in favor of the defaults
on `OpView`):

*   `_ODS_REGIONS`: A specification on the number and types of regions.
    Currently a tuple of (min_region_count, has_no_variadic_regions). Note that
    the API does some light validation on this but the primary purpose is to
    capture sufficient information to perform other default building and region
    accessor generation.
*   `_ODS_OPERAND_SEGMENTS` and `_ODS_RESULT_SEGMENTS`: Black-box value which
    indicates the structure of either the operand or results with respect to
    variadics. Used by `OpView._ods_build_default` to decode operand and result
    lists that contain lists.

#### Default Builder

Presently, only a single, default builder is mapped to the `__init__` method.
The intent is that this `__init__` method represents the *most specific* of the
builders typically generated for C++; however currently it is just the generic
form below.

*   One argument for each declared result:
    *   For single-valued results: Each will accept an `mlir.ir.Type`.
    *   For variadic results: Each will accept a `List[mlir.ir.Type]`.
*   One argument for each declared operand or attribute:
    *   For single-valued operands: Each will accept an `mlir.ir.Value`.
    *   For variadic operands: Each will accept a `List[mlir.ir.Value]`.
    *   For attributes, it will accept an `mlir.ir.Attribute`.
*   Trailing usage-specific, optional keyword arguments:
    *   `loc`: An explicit `mlir.ir.Location` to use. Defaults to the location
        bound to the thread (i.e. `with Location.unknown():`) or an error if
        none is bound nor specified.
    *   `ip`: An explicit `mlir.ir.InsertionPoint` to use. Default to the
        insertion point bound to the thread (i.e. `with InsertionPoint(...):`).

In addition, each `OpView` inherits a `build_generic` method which allows
construction via a (nested in the case of variadic) sequence of `results` and
`operands`. This can be used to get some default construction semantics for
operations that are otherwise unsupported in Python, at the expense of having a
very generic signature.

#### Extending Generated Op Classes

Note that this is a rather complex mechanism and this section errs on the side
of explicitness. Users are encouraged to find an example and duplicate it if
they don't feel the need to understand the subtlety. The `builtin` dialect
provides some relatively simple examples.

As mentioned above, the build system generates Python sources like
`_{DIALECT_NAMESPACE}_ops_gen.py` for each dialect with Python bindings. It is
often desirable to to use these generated classes as a starting point for
further customization, so an extension mechanism is provided to make this easy
(you are always free to do ad-hoc patching in your `{DIALECT_NAMESPACE}.py` file
but we prefer a more standard mechanism that is applied uniformly).

To provide extensions, add a `_{DIALECT_NAMESPACE}_ops_ext.py` file to the
`dialects` module (i.e. adjacent to your `{DIALECT_NAMESPACE}.py` top-level and
the `*_ops_gen.py` file). Using the `builtin` dialect and `FuncOp` as an
example, the generated code will include an import like this:

```python
try:
  from . import _builtin_ops_ext as _ods_ext_module
except ImportError:
  _ods_ext_module = None
```

Then for each generated concrete `OpView` subclass, it will apply a decorator
like:

```python
@_ods_cext.register_operation(_Dialect)
@_ods_extend_opview_class(_ods_ext_module)
class FuncOp(_ods_ir.OpView):
```

See the `_ods_common.py` `extend_opview_class` function for details of the
mechanism. At a high level:

*   If the extension module exists, locate an extension class for the op (in
    this example, `FuncOp`):
    *   First by looking for an attribute with the exact name in the extension
        module.
    *   Falling back to calling a `select_opview_mixin(parent_opview_cls)`
        function defined in the extension module.
*   If a mixin class is found, a new subclass is dynamically created that
    multiply inherits from `({_builtin_ops_ext.FuncOp},
    _builtin_ops_gen.FuncOp)`.

The mixin class should not inherit from anything (i.e. directly extends `object`
only). The facility is typically used to define custom `__init__` methods,
properties, instance methods and static methods. Due to the inheritance
ordering, the mixin class can act as though it extends the generated `OpView`
subclass in most contexts (i.e. `issubclass(_builtin_ops_ext.FuncOp, OpView)`
will return `False` but usage generally allows you treat it as duck typed as an
`OpView`).

There are a couple of recommendations, given how the class hierarchy is defined:

*   For static methods that need to instantiate the actual "leaf" op (which is
    dynamically generated and would result in circular dependencies to try to
    reference by name), prefer to use `@classmethod` and the concrete subclass
    will be provided as your first `cls` argument. See
    `_builtin_ops_ext.FuncOp.from_py_func` as an example.
*   If seeking to replace the generated `__init__` method entirely, you may
    actually want to invoke the super-super-class `mlir.ir.OpView` constructor
    directly, as it takes an `mlir.ir.Operation`, which is likely what you are
    constructing (i.e. the generated `__init__` method likely adds more API
    constraints than you want to expose in a custom builder).

A pattern that comes up frequently is wanting to provide a sugared `__init__`
method which has optional or type-polymorphism/implicit conversions but to
otherwise want to invoke the default op building logic. For such cases, it is
recommended to use an idiom such as:

```python
  def __init__(self, sugar, spice, *, loc=None, ip=None):
    ... massage into result_type, operands, attributes ...
    OpView.__init__(self, self.build_generic(
        results=[result_type],
        operands=operands,
        attributes=attributes,
        loc=loc,
        ip=ip))
```

Refer to the documentation for `build_generic` for more information.

## Providing Python bindings for a dialect

Python bindings are designed to support MLIR’s open dialect ecosystem. A dialect
can be exposed to Python as a submodule of `mlir.dialects` and interoperate with
the rest of the bindings. For dialects containing only operations, it is
sufficient to provide Python APIs for those operations. Note that the majority
of boilerplate APIs can be generated from ODS. For dialects containing
attributes and types, it is necessary to thread those through the C API since
there is no generic mechanism to create attributes and types. Passes need to be
registered with the context in order to be usable in a text-specified pass
manager, which may be done at Python module load time. Other functionality can
be provided, similar to attributes and types, by exposing the relevant C API and
building Python API on top.


### Operations

Dialect operations are provided in Python by wrapping the generic
`mlir.ir.Operation` class with operation-specific builder functions and
properties. Therefore, there is no need to implement a separate C API for them.
For operations defined in ODS, `mlir-tblgen -gen-python-op-bindings
-bind-dialect=<dialect-namespace>` generates the Python API from the declarative
description.
It is sufficient to create a new `.td` file that includes the original ODS
definition and use it as source for the `mlir-tblgen` call.
Such `.td` files reside in
[`python/mlir/dialects/`](https://github.com/llvm/llvm-project/tree/main/mlir/python/mlir/dialects).
The results of `mlir-tblgen` are expected to produce a file named
`_<dialect-namespace>_ops_gen.py` by convention. The generated operation classes
can be extended as described above. MLIR provides [CMake
functions](https://github.com/llvm/llvm-project/blob/main/mlir/cmake/modules/AddMLIRPython.cmake)
to automate the production of such files. Finally, a
`python/mlir/dialects/<dialect-namespace>.py` or a
`python/mlir/dialects/<dialect-namespace>/__init__.py` file must be created and
filled with `import`s from the generated files to enable `import
mlir.dialects.<dialect-namespace>` in Python.


### Attributes and Types

Dialect attributes and types are provided in Python as subclasses of the
`mlir.ir.Attribute` and `mlir.ir.Type` classes, respectively. Python APIs for
attributes and types must connect to the relevant C APIs for building and
inspection, which must be provided first. Bindings for `Attribute` and `Type`
subclasses can be defined using
[`include/mlir/Bindings/Python/PybindAdaptors.h`](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Bindings/Python/PybindAdaptors.h)
utilities that mimic pybind11 API for defining functions and properties. These
bindings are to be included in a separate pybind11 module. The utilities also
provide automatic casting between C API handles `MlirAttribute` and `MlirType`
and their Python counterparts so that the C API handles can be used directly in
binding implementations. The methods and properties provided by the bindings
should follow the principles discussed above.

The attribute and type bindings for a dialect can be located in
`lib/Bindings/Python/Dialect<Name>.cpp` and should be compiled into a separate
“Python extension” library placed in `python/mlir/_mlir_libs` that will be
loaded by Python at runtime. MLIR provides [CMake
functions](https://github.com/llvm/llvm-project/blob/main/mlir/cmake/modules/AddMLIRPython.cmake)
to automate the production of such libraries. This library should be `import`ed
from the main dialect file, i.e. `python/mlir/dialects/<dialect-namespace>.py`
or `python/mlir/dialects/<dialect-namespace>/__init__.py`, to ensure the types
are available when the dialect is loaded from Python.


### Passes

Dialect-specific passes can be made available to the pass manager in Python by
registering them with the context and relying on the API for pass pipeline
parsing from string descriptions. This can be achieved by creating a new
pybind11 module, defined in `lib/Bindings/Python/<Dialect>Passes.cpp`, that
calls the registration C API, which must be provided first. For passes defined
declaratively using Tablegen, `mlir-tblgen -gen-pass-capi-header` and
`-mlir-tblgen -gen-pass-capi-impl` automate the generation of C API. The
pybind11 module must be compiled into a separate “Python extension” library,
which can be `import`ed  from the main dialect file, i.e.
`python/mlir/dialects/<dialect-namespace>.py` or
`python/mlir/dialects/<dialect-namespace>/__init__.py`, or from a separate
`passes` submodule to be put in
`python/mlir/dialects/<dialect-namespace>/passes.py` if it is undesirable to
make the passes available along with the dialect.


### Other functionality

Dialect functionality other than IR objects or passes, such as helper functions,
can be exposed to Python similarly to attributes and types. C API is expected to
exist for this functionality, which can then be wrapped using pybind11 and
`[include/mlir/Bindings/Python/PybindAdaptors.h](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Bindings/Python/PybindAdaptors.h)`
utilities to connect to the rest of Python API. The bindings can be located in a
separate pybind11 module or in the same module as attributes and types, and
loaded along with the dialect.