File: programming_model.md

package info (click to toggle)
pytorch 2.9.1%2Bdfsg-1~exp2
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 180,096 kB
  • sloc: python: 1,473,255; cpp: 942,030; ansic: 79,796; asm: 7,754; javascript: 2,502; java: 1,962; sh: 1,809; makefile: 628; xml: 8
file content (529 lines) | stat: -rw-r--r-- 21,570 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
(export.programming_model)=

# torch.export Programming Model

This document aims to explain the behaviors and capabilities of
{func}`torch.export.export`. It is intended to help build your intuition
for how {func}`torch.export.export` handles code.

## Basics of Tracing

{func}`torch.export.export` captures a graph representing your model by
tracing its execution on "example" inputs and recording the PyTorch operations
and conditions observed along the traced path. This graph can then be run
on different inputs as long as they satisfy the same conditions.

The basic output of {func}`torch.export.export` is a single graph of PyTorch
operations, with associated metadata. The exact format of this output is
covered in the {ref}`export IR spec <export.ir_spec>`.

(non-strict-export)=

### Strict vs. Non-Strict Tracing

{func}`torch.export.export` provides two modes of tracing.

In *non-strict mode*, we trace through the program using the normal Python
interpreter. Your code executes exactly as it would in eager mode; the only
difference is that all Tensors are replaced by
[fake Tensors](https://pytorch.org/docs/main/torch.compiler_fake_tensor.html),
**which have shapes and other forms of metadata but no data**, wrapped in
[Proxy objects](https://pytorch.org/docs/main/fx.html) that record all
operations on them into a graph. We also capture
[conditions on Tensor shapes](https://pytorch.org/docs/main/torch.compiler_dynamic_shapes.html#the-guard-model)
**that guard the correctness of the generated code**.

In *strict mode*, we first trace through the program using
{ref}`TorchDynamo <torch.compiler_dynamo_deepdive>`, a Python bytecode
analysis engine. TorchDynamo does not actually execute your Python code.
Instead, it symbolically analyzes it and builds a graph based on the results.
On the one hand, this analysis allows {func}`torch.export.export` to provide
additional guarantees on Python-level safety (beyond capturing conditions on
Tensor shapes, as in non-strict mode). On the other hand, not all Python
features are supported by this analysis.

Although currently the default mode of tracing is strict, **we strongly
recommend using non-strict**, which will soon become the default.
For most models, conditions on Tensor shapes are enough for soundness, and
the additional guarantees on Python-level safety have no impact; at the same
time, the possibility of hitting unsupported Python features in TorchDynamo
presents an unnecessary risk.

In the rest of this document we assume we are tracing in
[non-strict mode](https://pytorch.org/docs/main/export.html#non-strict-export);
in particular, we assume that **all Python features are supported**.

## Values: Static vs. Dynamic

A key concept in understanding the behavior of {func}`torch.export.export` is
the difference between *static* and *dynamic* values.

### Static Values

A *static* value is a value that is **fixed at export time and cannot change
between executions of the exported program**. When the value is encountered
during tracing, we treat it as a constant and hard-code it into the graph.

When an operation is performed (e.g. `x + y`) and all inputs are static,
the output of the operation is directly hard-coded into the graph and the
operation does not show up (i.e. it gets "constant-folded").

When a value has been hard-coded into the graph, we say that the graph has
been *specialized* to that value. For example:

```python
import torch

class MyMod(torch.nn.Module):
    def forward(self, x, y):
        z = y + 7
        return x + z

m = torch.export.export(MyMod(), (torch.randn(1), 3))
print(m.graph_module.code)

"""
def forward(self, arg0_1, arg1_1):
    add = torch.ops.aten.add.Tensor(arg0_1, 10);  arg0_1 = None
    return (add,)

"""
```

Here, we provide `3` as the traced value for `y`; it is treated as a static
value and added to `7`, burning in the static value `10` in the graph.

### Dynamic Values

A *dynamic* value is one that **can change from run to run**. It behaves just
like a "normal" function argument: you can pass different inputs and expect
your function to do the right thing.

### Which values are static vs. dynamic?

Whether a value is static or dynamic depends on its type:

- For Tensor:

  - Tensor *data* is treated as dynamic.

  - Tensor *shapes* can be treated by the system as static or dynamic.

    - By default, shapes of all input Tensors are considered static.
      The user can override this behavior for any input Tensor by specifying
      a [dynamic shape](https://pytorch.org/docs/main/export.html#expressing-dynamism)
      for it.
    - Tensors that are part of module state, i.e., parameters and buffers,
      always have static shapes.

  - Other forms of Tensor *metadata* (e.g. `device`, `dtype`) are static.

- Python *primitives* (`int`, `float`, `bool`, `str`, `None`) are static.

  - There are dynamic variants for some primitive types (`SymInt`,
    `SymFloat`, `SymBool`). Typically users do not have to deal with them.
  - Users can specify integer inputs as dynamic by specifying
    a [dynamic shape](https://pytorch.org/docs/main/export.html#expressing-dynamism)
    for it.

- For Python *standard containers* (`list`, `tuple`, `dict`, `namedtuple`):

  - The structure (i.e., length for `list` and `tuple` values, and key
    sequence for `dict` and `namedtuple` values) is static.
  - The contained elements have these rules applied to them recursively
    (basically the
    [PyTree](https://jax.readthedocs.io/en/latest/pytrees.html) scheme)
    with leaves that are either Tensor or primitive types.

- Other *classes* (including data classes) can be registered with PyTree
  (see below), and follow the same rules as the standard containers.

## Input types

Inputs will be treated as either static or dynamic, based on their type
(as explained above).

- A static input will get hard-coded into the graph, and passing a different
  value at run time will result in an error. Recall that these are mostly
  values of primitive types.
- A dynamic input behaves like a "normal" function input. Recall that these
  are mostly values of Tensor types.

By default, the types of inputs you can use for your program are:

- Tensor
- Python primitives (`int`, `float`, `bool`, `str`, `None`)
- Python standard containers (`list`, `tuple`, `dict`, `namedtuple`)

### Custom Input Types (PyTree)

In addition, you can also define your own (custom) class and use it as an
input type, but you will need to register such a class as a PyTree.

Here's an example of using an utility to register a dataclass that is used as
an input type.

```python
@dataclass
class Input:
    f: torch.Tensor
    p: torch.Tensor

import torch.utils._pytree as pytree
pytree.register_dataclass(Input)

class M(torch.nn.Module):
    def forward(self, x: Input):
        return x.f + 1

torch.export.export(M(), (Input(f=torch.ones(10, 4), p=torch.zeros(10, 4)),))
```

### Optional input types

For optional inputs to the program that are not passed in,
{func}`torch.export.export` will specialize to their default values. As a
result, the exported program will require users to explicitly pass in all
arguments, and will lose the defaulting behavior. For example:

```python
class M(torch.nn.Module):
    def forward(self, x, y=None):
        if y is not None:
            return y * x
        return x + x

# Optional input is passed in
ep = torch.export.export(M(), (torch.randn(3, 3), torch.randn(3, 3)))
print(ep)
"""
ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, x: "f32[3, 3]", y: "f32[3, 3]"):
            # File: /data/users/angelayi/pytorch/moo.py:15 in forward, code: return y * x
            mul: "f32[3, 3]" = torch.ops.aten.mul.Tensor(y, x);  y = x = None
            return (mul,)
"""

# Optional input is not passed in
ep = torch.export.export(M(), (torch.randn(3, 3),))
print(ep)
"""
ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, x: "f32[3, 3]", y):
            # File: /data/users/angelayi/pytorch/moo.py:16 in forward, code: return x + x
            add: "f32[3, 3]" = torch.ops.aten.add.Tensor(x, x);  x = None
            return (add,)
"""
```

## Control Flow: Static vs. Dynamic

Control flow is supported by {func}`torch.export.export`. The behavior of
control flow depends on whether the value you are branching on is static or
dynamic.

### Static Control Flow

**Python control flow over static values is supported transparently**. (Recall
that static values include static shapes, so control flow over static shapes
is also covered by this case.)

As mentioned above, we "burn in" static values, so the exported graph will
never see any control flow over static values.

In the case of an `if` statement, we will continue tracing the branch taken
at export time. In the case of a `for` or `while` statement, we will continue
tracing by unrolling the loop.

### Dynamic Control Flow: Shape-Dependent vs. Data-Dependent

When the value involved in a control flow is dynamic, it could depend on
dynamic shapes or dynamic data. Given that the compiler traces with
information on shapes rather than data, the implications on the programming
model are different in these cases.

#### Dynamic Shape-Dependent Control Flow

When the value involved in a control flow is a
[dynamic shape](https://pytorch.org/docs/main/torch.compiler_dynamic_shapes.html),
in most cases **we will also know the concrete value of the dynamic shape
during tracing**: see the following section for more details on how the
compiler tracks this information.

In these cases we say that the control flow is shape-dependent. **We use the
concrete value of the dynamic shape to evaluate the condition** to either
`True` or `False` and continue tracing (as discussed above), additionally
emitting a guard corresponding to the condition just evaluated.

Otherwise the control flow is considered data-dependent. We cannot evaluate
the condition to either `True` or `False`, so cannot continue tracing and have to
raise an error at export time. See next section.

#### Dynamic Data-Dependent Control Flow

**Data-dependent control flow over dynamic values is supported, but you must
use one of PyTorch's explicit operators** to continue tracing. Using Python
control flow statements over dynamic values is not permitted, because the
compiler cannot evaluate the conditions necessary to continue tracing and
thus an error must be raised at export time.

We provide **operators to express general conditionals and loops over dynamic
values**, e.g., `torch.cond`, `torch.map`. Note that you only need to use these
if you truly want *data-dependent control flow*.

Here's an example of an `if` statement on a data-dependent condition,
`x.sum() > 0`, where `x` is an input Tensor, rewritten using `torch.cond`.
Instead of having to decide which branch to trace, now both branches are
traced.

```python
class M_old(torch.nn.Module):
    def forward(self, x):
        if x.sum() > 0:
            return x.sin()
        else:
            return x.cos()

class M_new(torch.nn.Module):
    def forward(self, x):
        return torch.cond(
            pred=x.sum() > 0,
            true_fn=lambda x: x.sin(),
            false_fn=lambda x: x.cos(),
            operands=(x,),
        )
```

A special case of data-dependent control flow is where it involves a
[data-dependent dynamic shape](https://pytorch.org/docs/main/torch.compiler_dynamic_shapes.html#unbacked-symints):
typically, the shape of some intermediate Tensor that depends on input data
rather than on input shapes (thus not shape-dependent). Instead of using a
control flow operator, in this case you can provide an assertion that decides
whether the condition is `True` or `False`. Given such an assertion, we can
continue tracing, emitting a guard as above.

We provide **operators to express assertions on dynamic shapes**, e.g.,
`torch._check`. Note that you only need to use this when there is control
flow on data-dependent dynamic shapes.

Here's an example of an `if` statement on a condition involving a
data-dependent dynamic shape, `nz.shape[0] > 0`, where `nz` is the result of
calling {func}`torch.nonzero`, an operator whose output shape depends on input
data. Instead of rewriting it, you can add an assertion using `torch._check`
to effectively decide which branch to trace.

```python
class M_old(torch.nn.Module):
    def forward(self, x):
        nz = x.nonzero()
        if nz.shape[0] > 0:
            return x.sin()
        else:
            return x.cos()

class M_new(torch.nn.Module):
    def forward(self, x):
        nz = x.nonzero()
        torch._check(nz.shape[0] > 0)
        if nz.shape[0] > 0:
            return x.sin()
        else:
            return x.cos()
```

## Basics of Symbolic Shapes

During tracing, dynamic Tensor shapes and conditions over them are encoded as
"symbolic expressions." (In contrast, static Tensor shapes and conditions
over them are simply `int` and `bool` values.)

A *symbol* is like a variable; it describes a dynamic Tensor shape.

As tracing proceeds, shapes of intermediate Tensors may be described by more
general expressions, typically involving integer arithmetic operators. This
is because **for most PyTorch operators, shapes of output Tensors can be
described as functions of shapes of input Tensors**. For example, the shape of
the output of {func}`torch.cat` is the sum of the shapes of its inputs.

Moreover, as we encounter control flow in the program, we create boolean
expressions, typically involving relational operators, describing conditions
along the traced path. These **expressions are evaluated to decide which path
to trace through the program**, and recorded in a
[shape environment](https://pytorch.org/docs/main/torch.compiler_dynamic_shapes.html#overall-architecture)
to guard the correctness of the traced path and to evaluate subsequently
created expressions.

We briefly introduce these subsystems next.

### Fake Implementations of PyTorch Operators

Recall that during tracing, we are executing the program with
[fake Tensors](https://pytorch.org/docs/main/torch.compiler_fake_tensor.html),
which have no data. In general we cannot call the actual implementations of
PyTorch operators with fake Tensors. Thus each operator needs to have an
additional fake (a.k.a. "meta") implementation, which inputs and outputs fake
Tensors, that matches the behavior of the actual implementation in terms of
shapes and other forms of metadata carried by fake Tensors.

For example, note how the fake implementation of {func}`torch.index_select`
computes the shape of the output using the shape of the input (while ignoring
input data and returning empty output data).

```python
def meta_index_select(self, dim, index):
    result_size = list(self.size())
    if self.dim() > 0:
        result_size[dim] = index.numel()
    return self.new_empty(result_size)
```

#### Shape Propagation: Backed vs. Unbacked Dynamic Shapes

Shapes are propagated using fake implementations of PyTorch operators.

A key concept to understand the propagation of dynamic shapes in particular
is the difference between *backed* and *unbacked* dynamic shapes: we know the
concrete values of the former but not the latter.

Propagation of shapes, including tracking backed and unbacked dynamic shapes,
proceeds as follows:

- The shapes of Tensors representing inputs can be static or dynamic. When
  dynamic, they are described by symbols; moreover, **such symbols are backed
  since we also know their concrete values given the "real" example inputs
  provided by the user at export time**.

- The output shape of an operator is computed by its fake implementation, and
  is either static or dynamic. When dynamic, in general it is described by a
  symbolic expression. Moreover:

  - If the output shape depends only on input shapes, it is either static or
    backed dynamic whenever the input shapes are all static or backed dynamic.
  - On the other hand, **if the output shape depends on input data**, it is
    necessarily dynamic, and moreover, **because we cannot know its concrete
    value it is unbacked**.

### Control Flow: Guards and Assertions

When a condition on shapes is encountered, it either involves only static
shapes, in which case it is a `bool`, or it involves dynamic shapes, in which
case it is a symbolic boolean expression. For the latter:

- When the condition involves only backed dynamic shapes, we can use the
  concrete values of those dynamic shapes to evaluate the condition to `True`
  or `False`. We can then add a guard to the shape environment that states
  that the corresponding symbolic boolean expression is `True` or `False`,
  and continue tracing.
- Otherwise the condition involves unbacked dynamic shapes. In general we
  cannot evaluate such a condition without additional information; thus we
  cannot continue tracing, and we must raise an error at export time. The
  user is expected to use an explicit PyTorch operator for tracing to
  continue. This information is added as a guard in the shape environment,
  and can also possibly help evaluate other subsequently encountered
  conditions to `True` or `False`.

Once the model is exported, **any guards on backed dynamic shapes can be
understood as conditions on input dynamic shapes**. These are verified against
a dynamic shape specification that must have been provided to export,
describing conditions on dynamic shapes that not only example inputs but also
all future inputs are expected to satisfy for the generated code to be
correct. More precisely, the dynamic shape specification must logically imply
the generated guards, otherwise an error is raised at export time (along with
suggested fixes to the dynamic shape specification). On the other hand, when
there are no generated guards on backed dynamic shapes (in particular, when
all shapes are static) no dynamic shape specification needs to be provided to
export. In general, the dynamic shape specification is converted to runtime
assertions on the inputs of the generated code.

Finally, **any guards on unbacked dynamic shapes are converted to "inline"
runtime assertions**. These are added in the generated code at the locations
where those unbacked dynamic shapes were created: typically, right after
data-dependent operator calls.

## Allowed PyTorch operators

All PyTorch operators are permitted.

### Custom operators

In addition, you can define and use
[custom operators](https://pytorch.org/tutorials/advanced/python_custom_ops#python-custom-ops-tutorial).
Defining a custom operator includes defining a fake implementation for it,
just like any other PyTorch operator (see previous section).

Here's an example of a custom `sin` operator that wraps NumPy, and its
registered (trivial) fake implementation.

```python
@torch.library.custom_op("mylib::sin", mutates_args=())
def sin(x: Tensor) -> Tensor:
    x_np = x.numpy()
    y_np = np.sin(x_np)
    return torch.from_numpy(y_np)

@torch.library.register_fake("mylib::sin")
def _(x: Tensor) -> Tensor:
    return torch.empty_like(x)
```

**Sometimes your custom operator's fake implementation will involve
data-dependent shapes**. Here's how a fake implementation for a custom
`nonzero` might look like.

```python
...

@torch.library.register_fake("mylib::custom_nonzero")
def _(x):
    nnz = torch.library.get_ctx().new_dynamic_size()
    shape = [nnz, x.dim()]
    return x.new_empty(shape, dtype=torch.int64)
```

## Module State: Reads vs. Updates

Module states include parameters, buffers, and regular attributes.

- A regular attribute can be of any type.
- On the other hand, parameters and buffers are always Tensors.

Module states can be dynamic or static, based on their types as outlined
above. For example, `self.training` is a `bool`, which means it is static; on
the other hand, any parameter or buffer is dynamic.

The *shapes* of any Tensors contained in module states cannot be dynamic, i.e.,
those shapes are fixed at export time, and cannot change between executions
of the exported program.

### Access rules

**All module states must be initialized**. Accessing a module state that is
not already initialized causes an error to be raised at export time.

**Reading module states is always permitted**.

Updating module states is possible, but must follow the rules below:

- **A static regular attribute** (e.g., of primitive type) **can be updated**.
  Reads and updates can be freely interleaved, and as expected, any reads
  will always see the values of the latest updates. Because these attributes
  are static, we will also burn the values in, so the generated code will not
  have any instructions to actually "get" or "set" such attributes.
- **A dynamic regular attribute** (e.g., of Tensor type) **cannot be updated**.
  To do so, it must be registered as a buffer during module initialization.
- **A buffer can be updated**, where the updating can be in-place (e.g.,
  `self.buffer[:] = ...`) or not (e.g., `self.buffer = ...`).
- **A parameter cannot be updated**. Typically parameters are updated only
  during training, not during inference. We recommend exporting with
  {func}`torch.no_grad` to avoid parameter updates at export time.

### Effects of functionalization

Any dynamic module state that is read and/or updated is "lifted"
(respectively) as an input and/or output of the generated code.

The exported program stores, along with the generated code, the initial
values of parameters and buffers and the constant values of other Tensor
attributes.