File: torch.compiler_troubleshooting.rst

package info (click to toggle)
pytorch-cuda 2.6.0%2Bdfsg-7
  • links: PTS, VCS
  • area: contrib
  • in suites: forky, sid, trixie
  • size: 161,620 kB
  • sloc: python: 1,278,832; cpp: 900,322; ansic: 82,710; asm: 7,754; java: 3,363; sh: 2,811; javascript: 2,443; makefile: 597; ruby: 195; xml: 84; objc: 68
file content (1113 lines) | stat: -rw-r--r-- 47,966 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
.. _torch.compiler_troubleshooting:

torch.compile Troubleshooting
=================================

You're trying to use ``torch.compile`` on your PyTorch model to enhance its performance
but it's not working as expected. Perhaps performance isn't improving, crashes are happening, or compilation time is too long. This article provides tips, workarounds, and debugging tools to help you overcome these challenges.

**Contents**

.. contents::
    :local:

Setting Expectations
~~~~~~~~~~~~~~~~~~~~

``torch.compile`` is designed as a general-purpose PyTorch compiler.
Unlike the previous compiler solution, TorchScript, ``torch.compile``
requires fewer code changes, meaning models typically don't need to be rewritten from scratch.
It also manages unsupported code more gracefully - unsupported code results in a lost optimization opportunity rather than a crash.

In the ideal world, one can simply apply ``torch.compile`` to any PyTorch model and enjoy automatic speedups.
However, in reality, code complexities can lead to one of three scenarios:

1. ``torch.compile`` works seamlessly, providing speedups.
2. Some code modifications are necessary. ``torch.compile`` doesn't crash or take too long,
   but you might not be seeing significant performance gains.
3. Extensive changes to your code are required.

We anticipate most code will fall under scenarios (1) and (2).
This document provides tips, arranged by level of involvement, to help address code issues in scenario (2).

Compile times
-------------

``torch.compile`` functions as a just-in-time compiler, so the initial one or two runs
of the compiled function are expected to be significantly slower. Recompilations, which can occur under certain conditions (detailed below),
will also make runs slower. Various ``torch.compile`` components cache results to
reduce compilation time for future invocations, even in different processes.
Cold-start (uncached) compilation time typically ranges from seconds to minutes for common or benchmarked models.
Larger models may take upwards of 30 minutes to a few hours.

Terminology
~~~~~~~~~~~

The following terms are relevant to troubleshooting ``torch.compile`` problems.

Graph break
-----------

``torch.compile`` traces your code and attempts to capture your PyTorch code into a
single computation graph of PyTorch operators (FX graph). However, this is not always possible.
When encountering code that can't be traced, a "graph break" occurs.
A graph break involves compiling the FX graph has been determined so far, running the unsupported code,
then resuming tracing after the unsupported code with a new FX graph.
Because the computation graph is broken up, we lose optimization opportunities,
so model code should avoid graph breaks whenever possible.
Graph breaks occur on things like:

- Data-dependent if-statements
- Many Python built-in functions
- C functions

Below is an example of a graph break due to the function ``copy.deepcopy`` from a Python builtin library
(exact output may differ).

.. code-block:: py

    import torch

    @torch.compile
    def fn(x):
        x = x + 1
        with open("test.txt", "r") as f:
            return x + len(f.read())

    fn(torch.ones(3, 3))

::

    $TORCH_LOGS="graph_breaks" python playground.py
    Graph break in user code at /data/users/williamwen/pytorch/playground.py:7
    Reason: Unsupported: builtin: open [<class 'torch._dynamo.variables.constant.ConstantVariable'>, <class 'torch._dynamo.variables.constant.ConstantVariable'>] False
    User code traceback:
    File "/data/users/williamwen/pytorch/playground.py", line 7, in fn
        with open("test.txt", "r") as f:
    Traceback (most recent call last):
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 635, in wrapper
        return inner_fn(self, inst)
            ^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2414, in CALL
        self._call(inst)
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2408, in _call
        self.call_function(fn, args, kwargs)
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 962, in call_function
        self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/variables/builtin.py", line 997, in call_function
        return handler(tx, args, kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/variables/builtin.py", line 831, in <lambda>
        return lambda *args: unimplemented(error_msg)
                            ^^^^^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/exc.py", line 313, in unimplemented
        raise Unsupported(msg, case_name=case_name)
    torch._dynamo.exc.Unsupported: builtin: open [<class 'torch._dynamo.variables.constant.ConstantVariable'>, <class 'torch._dynamo.variables.constant.ConstantVariable'>] False

Guards
------

``torch.compile`` makes some assumptions about runtime values as we trace through code.
During tracing, we generate "guards", which are runtime checks for these assumptions.
Guards are run in future calls to the compiled function to determine if we can reuse previously compiled code.
Examples of runtime checks are constant values, types, and object IDs.

Below is an example of generated guards. The ``TENSOR_MATCH`` guard checks for the input's type, device, dtype, shape, etc.

.. code-block:: py

    import torch

    @torch.compile
    def fn(x):
        return x + 1

    fn(torch.ones(3, 3))

::

    $ TORCH_LOGS="guards" python playground.py
    GUARDS:

    TREE_GUARD_MANAGER:
    +- RootGuardManager
    | +- DEFAULT_DEVICE: utils_device.CURRENT_DEVICE == None                           # _dynamo/output_graph.py:471 in init_ambient_guards
    | +- GLOBAL_STATE: ___check_global_state()
    | +- TORCH_FUNCTION_MODE_STACK: ___check_torch_function_mode_stack()
    | +- GuardManager: source=L['x'], accessed_by=DictGetItemGuardAccessor(x)
    | | +- TENSOR_MATCH: check_tensor(L['x'], Tensor, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=False, size=[3, 3], stride=[3, 1])  # return x + 1  # playground.py:6 in fn
    | | +- NO_HASATTR: hasattr(L['x'], '_dynamo_dynamic_indices') == False           # return x + 1  # playground.py:6 in fn

Recompilation
-------------

If the guards fail for every instance of previously compiled code,
then ``torch.compile`` must "recompile" the function, requiring the original code to be traced again.

In the example below, recompilation is necessary because the guard checking the tensor argument's shape failed.

.. code-block:: py

    import torch

    @torch.compile
    def fn(x):
        return x + 1

    fn(torch.ones(3, 3))
    fn(torch.ones(4, 4))

::

    $ TORCH_LOGS="recompiles" python playground.py
    Recompiling function fn in /data/users/williamwen/pytorch/playground.py:3
        triggered by the following guard failure(s):
        - 0/0: tensor 'L['x']' size mismatch at index 0. expected 3, actual 4

Dynamic Shapes
-------------------------
``torch.compile`` initially assumes tensor shapes are static/constant and guards based on these assumptions.
By using "dynamic shapes," we can get ``torch.compile`` to produce compiled code that can accept
tensor inputs with different shapes - we avoid recompiling every time shapes differ.
By default, automatic dynamic shapes are enabled ``torch.compile(dynamic=None)`` -
if compilation fails due to shape mismatch, recompilation is attempted with dynamic shapes.
Dynamic shapes can also be fully enabled ``dynamic=True`` or disabled ``dynamic=False``.

Below, we enable dynamic shapes and note that we no longer need to recompile.

.. code-block:: py

    import torch

    @torch.compile(dynamic=True)
    def fn(x):
        return x + 1

    fn(torch.ones(3, 3))
    fn(torch.ones(4, 4))

::

    $ TORCH_LOGS="dynamic,recompiles" python playground.py
    create_symbol s0 = 3 for L['x'].size()[0] [2, int_oo] at playground.py:5 in fn (_dynamo/variables/builder.py:2718 in <lambda>), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s0"
    produce_guards
    produce_guards

For more information on dynamic shapes, see `The dynamic shapes manual <https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.fh8zzonyw8ng>`__.

Logging Tools
~~~~~~~~~~~~~

tlparse / TORCH_TRACE
-----------------------------

``tlparse`` / ``TORCH_TRACE`` are a pair of tools that produce compilation reports that look like this:
https://web.mit.edu/~ezyang/Public/bhack-20240609-tlparse/index.html.

Traces are very easy to collect. To collect a trace, run your reproduction command with

::

    TORCH_TRACE="/tmp/tracedir" python foo.py
    pip install tlparse
    tlparse /tmp/tracedir

This approach works even if you are running a distributed job, providing a trace for each rank.
It will open your browser with HTML similar to what's generated above.
If you are making a bug report for a complicated problem that you don't have a standalone reproduction for,
you can still greatly assist PyTorch developers by attaching the trace log generated in ``/tmp/tracedir``.

.. warning:: The trace log contains all of your model code.
   Do not share the trace log if the model you are working on is sensitive. The trace log does NOT contain weights.

.. raw:: html

    <style>
        .red {background-color:#ff0000;}
        .green {background-color:#00ff00;}
        .dark-green {background-color:#027f02;}
    </style>

.. role:: red

.. role:: green

.. role:: dark-green

The output of ``tlparse`` is primarily aimed for PyTorch developers,
and the log format is easy to upload and share on GitHub.
However,  as a non-PyTorch developer, you can still extract useful information from it.
We recommend starting with the inline help text in the report, which explains its contents.
Here are some insights you can gain from a ``tlparse``:

- What model code was compiled by looking at the stack trie?
  This is especially useful if you're not familiar with the codebase being compiled!
- How many graph breaks / distinct compilation regions are there?
  (Each distinct compile is its own color coded block like :dark-green:`[0/0]`).
  Frames that are potentially graph-broken are light green :green:`[2/4]`.
  If there are a lot of frames, that is suspicious, and suggests that you had some catastrophic graph breaks,
  or maybe your code isn't a good match for ``torch.compile``.
- How many times did I recompile a particular frame? Something that recompiled a lot will look like:
  :dark-green:`[10/0]` :dark-green:`[10/1]` :dark-green:`[10/2]`
  - if something is being recompiled a lot, that is very suspicious and worth looking into, even if it isn't the root cause of your problem.
- Was there a compilation error? Frames that errored will look like :red:`[0/1]`.
- What intermediate compiler products did I generate for a given frame?
  For example, you can look at the high-level generated FX graph or the generated Triton code.
- Is there relevant information for a particular frame? You can find these in ``compilation_metrics``.

TORCH_LOGS
--------------

You can use the ``TORCH_LOGS`` environment variable to selectively enable parts of the ``torch.compile`` stack to log.
``TORCH_LOGS`` is in fact the source of logs for ``tlparse``. The format of the ``TORCH_LOGS`` environment variable looks like this:

::

    TORCH_LOGS="<option1>,<option2>,..." python foo.py


Useful high-level options include:

- ``graph_breaks``: logs locations of graph breaks in user code and the reason for the graph break
- ``guards``: logs guards that are generated
- ``recompiles``: logs which function recompiled and the guards that failed, leading to the recompilation
- ``dynamic``: logs related to dynamic shapes

Also, you can programmatically set logging options using ``torch._logging.set_logs``:

.. code-block:: py

    import logging
    torch._logging.set_logs(graph_breaks=True)
    ...

More ``TORCH_LOGS`` options are :ref:`detailed below <troubleshooting_torch_logs_options>`.
For the full list of options, see `torch._logging <https://pytorch.org/docs/stable/logging.html>`__
and `torch._logging.set_logs <https://pytorch.org/docs/stable/generated/torch._logging.set_logs.html#torch._logging.set_logs>`__.

tlparse vs. TORCH_LOGS
----------------------

Generally, we suggest first using ``tlparse`` when encountering issues.
``tlparse`` is ideal for debugging large models and gaining a high-level overview of how your model was compiled.
On the other hand, ``TORCH_LOGS`` is preferred for small examples and fine-grained debugging detail,
when we already have an idea of which ``torch.compile`` component is causing the problem.

Simple Workarounds
~~~~~~~~~~~~~~~~~~

Here, we describe some workarounds to ``torch.compile`` issues involving small code modifications
or changing some ``torch.compile`` settings.

Where to apply torch.compile?
---------------------------------

We recommend applying ``torch.compile`` to the highest-level function that doesn't cause excessive problems.
Typically, it is your train or eval step with the optimizer but without the loop, your top-level ``nn.Module``,
or some sub-``nn.Module``s. ``torch.compile`` specifically doesn't handle distributed wrapper modules like
DDP or FSDP very well, so consider applying ``torch.compile`` to the inner module passed to the wrapper.

.. code-block:: py

    # inference
    model = ...
    opt_model = torch.compile(model)

    for _ in range(N_ITERS):
        inp = ...
        out = opt_model(inp)

.. code-block:: py

    # training
    model = ...
    opt = torch.optim.Adam(model.parameters())

    @torch.compile
    def train(mod, data):
        opt.zero_grad(True)
        pred = mod(data[0])
        loss = torch.nn.CrossEntropyLoss()(pred, data[1])
        loss.backward()
        opt.step()

    for _ in range(N_ITERS):
        inp = ...
        train(model, inp)

.. code-block:: py

    # DistributedDataParallel
    model = ...
    opt_model = torch.compile(model)
    model_ddp = DistributedDataParallel(opt_model, ...)

    for _ in range(N_ITERS):
        inp = ...
        out = model_ddp(inp)

Disabling and Suppressing Errors
---------------------------------

For some model architectures, there are portions of the model which are particularly difficult to compile
- either there are many graph breaks, or there are crashes. You may want to explicitly disable these
portions of the model which are problematic so that you can apply ``torch.compile`` to the parts that work.
You can do this by using the ``@torch.compiler.disable`` decorator. When ``torch.compile`` attempts to call a
disabled function, it breaks the graph and skips tracing the disabled function, resuming tracing after the call.
By default, all recursive calls made from a disabled function are also disabled. Use the ``recursive=False``
option to allow compilation for recursive calls.

.. code-block:: py

    def bad1_inner(...):
        # skipped

    @torch.compiler.disable
    def bad1_outer(...):
        # skipped
        bad1_inner(...)

    def bad2_inner(...)
        # traced

    @torch.compiler.disable(recursive=False)
    def bad2_outer(...):
        # skipped
        bad2_inner(...)

    @torch.compile
    def fn(...):
        # graph break
        bad1_outer(...)
        ...
        # graph break
        bad2_outer(...)

For example, we use ``torch.compiler.disable`` to disable ``torch.compile`` on sparse architecture in
recommendation models, as the sparse arch is difficult to compile. Preprocessing and logging functions
are other examples of functions that typically cause a lot of graph breaks and do not get value from being compiled.

If you are experiencing compiler crashes and you want to continue regardless, you can set
``torch._dynamo.config.suppress_errors = True``. When the compiler crashes, we will just skip tracing
the function and try again later. This is not best practice - it is better to eventually manually add
disable annotations as necessary.

Resolving graph breaks
----------------------

To maximize optimization opportunities, it's important to reduce the number of graph breaks.
Recall that you can see what graph breaks are happening using ``tlparse`` or ``TORCH_LOGS="graph_breaks"``.
In general, graph breaks are caused by one of the following:

1. You're trying to do something that fundamentally cannot be traced, such as data-dependent control flow.
2. You're trying to do something not yet supported. .
   For example, we currently have limited support for tracing code that uses the built-in Python ``inspect`` module.
3. Your code has an error in it. For example, you may have tried calling a function with an incorrect number of arguments.

Graph break logs will tell you the user code location and reason for the graph break.
Unfortunately, many graph breaks are not actionable without a deeper understanding of Dynamo.
It can even be challenging to determine which of the three causes was the true cause of your graph break.
We are working on making graph break messages more actionable.

Additionally, the impact of lost optimization opportunities differs between graph breaks.
For example, graph breaks that happen in the middle of your model's ``forward`` are likely to have a more negatie impact than
graph breaks in a preprocessing part at the beginning of the ``forward``. So it is not crucial to prevent *every single*
break, but rather to prevent the ones that cause significant performance hits.

If a graph break message doesn't suggest any action, you suspect that the cause of your graph break is (2),
and you believe that the graph break is causing performance hits,
then please report the graph break as an issue. If a function has many graph breaks,
consider disabling compilation on that function, as the overhead cost for the graph breaks may become prohibitive.

Below are some common graph breaks and some workarounds.

Data-dependent operations
^^^^^^^^^^^^^^^^^^^^^^^^^

``torch.compile`` graph breaks on data-dependent operations such as data-dependent control flow
(if-statements, loops with tensors) and direct tensor data accesses (``.item``, ``.data_ptr``).

.. code-block:: py

    import torch

    @torch.compile
    def fn(x):
        y = x.sum()
        if y > 0:
            return x + y.item()
        return x - y.item()

    fn(torch.ones(3, 3))

::

    $ TORCH_LOGS="graph_breaks" python playground.py
    Graph break in user code at /data/users/williamwen/pytorch/playground.py:6
    Reason: Data-dependent jump
    User code traceback:
    File "/data/users/williamwen/pytorch/playground.py", line 6, in fn
        if y > 0:

    Graph break in user code at /data/users/williamwen/pytorch/playground.py:7
    Reason: Unsupported: Tensor.item
    User code traceback:
    File "/data/users/williamwen/pytorch/playground.py", line 7, in torch_dynamo_resume_in_fn_at_6
        return x + y.item()
    Traceback (most recent call last):
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 616, in wrapper
        return inner_fn(self, inst)
            ^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2288, in CALL
        self._call(inst)
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2282, in _call
        self.call_function(fn, args, kwargs)
    File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 838, in call_function
        self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/variables/misc.py", line 1038, in call_function
        return self.obj.call_method(tx, self.name, args, kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/variables/tensor.py", line 527, in call_method
        result = handler_method(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/data/users/williamwen/pytorch/torch/_dynamo/variables/tensor.py", line 773, in method_item
        unimplemented("Tensor.item")
    File "/data/users/williamwen/pytorch/torch/_dynamo/exc.py", line 304, in unimplemented
        raise Unsupported(msg, case_name=case_name)
    torch._dynamo.exc.Unsupported: Tensor.item

The general workaround for these graph breaks is to avoid doing data-dependent operations. Some specific workarounds are:

- If your control flow doesn't actually depend on data values, consider modifying your code to perform control flow on constants.

.. code-block:: py

    # old
    x = torch.randn(3, 3)
    @torch.compile
    def fn(y):
        if x.sum() > 0:
            return y + x
        else:
            return y - x

    # new
    x = torch.randn(3, 3)
    cond = (x.sum() > 0).item()
    @torch.compile
    def fn(y):
        if cond:
            return y + x
        else:
            return y - x

- Use higher-order ops like ``torch.cond`` (https://pytorch.org/docs/main/cond.html) in place of data-dependent control flow

.. code-block:: py

    # old
    @torch.compile
    def fn(x):
        if x.sum() > 0:
            return x + 1
        return x - 1

    # new
    @torch.compile
    def fn(x):
        return torch.cond(
            x.sum() > 0,
            lambda x: x + 1,
            lambda x: x - 1,
            (x,),
        )

- If you have a ``.item()`` call, try ``torch._dynamo.config.capture_scalar_outputs = True`` or ``TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1``
- Wrap problematic parts of the function in a custom op

Custom ops
^^^^^^^^^^

If you have code that ``torch.compile`` has trouble tracing through, either due to missing support or fundamental incompatibility,
you can consider wrapping the problematic code in a custom op.

Custom ops require a little bit of additional work to get them to be compatible with ``torch.compile``.
See https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details.

Printing
^^^^^^^^

Printing/logging/issuing warnings will result in a graph break. If you have a function that makes many logging calls,
for example, a function that logs data about a training iteration, consider applying ``torch.compiler.disable`` on it.

Alternatively, you can try using ``torch._dynamo.config.reorderable_logging_functions``.
This config is used to reorder logging functions so that they are called at the end of the traced function,
thus avoiding a graph break. However, the logged contents may differ if, for example, a mutation occurs.

.. code-block:: py

    import torch

    torch._dynamo.config.reorderable_logging_functions.add(print)

    @torch.compile
    def fn(x):
        x += 1
        print("log!")
        return torch.sin(x)

    fn(torch.ones(3, 3))

::

    $ TORCH_LOGS="graph_breaks" python playground.py
    log!

Incorrect code
^^^^^^^^^^^^^^

Your code may be wrong, or is otherwise encountering an error from outside ``torch.compile``.
In the code below, we made a typo in the ``torch.sin`` call by providing an extra argument.

.. code-block:: py

    import torch

    @torch.compile
    def fn(x):
        y = torch.sin(x, x)
        return y

    fn(torch.ones(3, 3))

::

    $ TORCH_LOGS="graph_breaks" python playground.py
    Graph break in user code at /data/users/williamwen/pytorch/playground.py:5
    Reason: Unsupported: TypeError <built-in method sin of type object at 0x7fd6fd764600>: sin() takes 1 positional argument but 2 were given
    User code traceback:
    File "/data/users/williamwen/pytorch/playground.py", line 5, in fn
        y = torch.sin(x, x)
    ...

It can be difficult to tell from the logs if the error is caused by your code or because of a ``torch.compile`` bug.
In order to differentiate, we recommend trying to run your code without ``torch.compile`` to see if you still get the error.

Dealing with recompilations
---------------------------

You can view recompilations and their reasons using ``tlparse`` or ``TORCH_LOGS=recompiles``.

Is dynamic shapes enabled?
^^^^^^^^^^^^^^^^^^^^^^^^^^

Recompilations due to mismatched shapes are in the form:

::

    tensor 'L['x']' size mismatch at index 0. expected 3, actual 4

Make sure that the ``dynamic`` option of ``torch.compile`` is not set to ``False``.
The default option, ``dynamic=None``, will only attempt dynamic shapes after the first compilation.
You can set ``dynamic=True`` to upfront compile as dynamic as possible.

For more information on dynamic shapes, see `The dynamic shapes manual <https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.fh8zzonyw8ng>`__.

Changing the cache size limit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There is a limit to how many times a function can be recompiled, determined by ``torch._dynamo.config.cache_size_limit``
and ``torch._dynamo.config.accumulated_cache_size_limit``.
If either limit is exceeded, then we will not attempt to compile the function again and instead will run the function eagerly.
``torch.compile`` will also issue a warning containing the affected function and which limit was hit.
In the example below, each function call results in a recompile attempt.
When we hit the cache size limit (8), we stop attempting to recompile.

.. code-block:: py

    import torch

    @torch.compile(dynamic=False)
    def fn(x):
        return x + 1

    for i in range(1, 10):
        fn(torch.ones(i))

::

    $ python playground.py
    torch._dynamo hit config.cache_size_limit (8)
        function: 'fn' (/data/users/williamwen/pytorch/playground.py:5)
        last reason: 0/0: tensor 'L['x']' size mismatch at index 0. expected 1, actual 9

If you know that the number of recompilations has a reasonable constant upper bound, you can raise the cache size limit.
If the cost of recompilation outweighs the benefit of compilation, then you can consider lowering the cache size limit.

Wrapping constants with tensors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, ``int`` / ``float`` variables are treated as constants and are guarded as such.
In the below example, we have a recompilation for each function call.

.. code-block:: py

    import torch

    @torch.compile
    def fn(x, c):
        return x + c

    for i in range(1, 10):
        fn(torch.ones(i), 0.5 + i)

::

    $ TORCH_LOGS="recompiles" python playground.py
    Recompiling function fn in /data/users/williamwen/pytorch/playground.py:3
        triggered by the following guard failure(s):
        - 0/7: L['c'] == 8.5
        - 0/6: L['c'] == 7.5
        - 0/5: L['c'] == 6.5
        - 0/4: L['c'] == 5.5
        - 0/3: L['c'] == 4.5
        - 0/2: L['c'] == 3.5
        - 0/1: L['c'] == 2.5
        - 0/0: L['c'] == 1.5
    torch._dynamo hit config.cache_size_limit (8)
        function: 'fn' (/data/users/williamwen/pytorch/playground.py:3)
        last reason: 0/0: L['c'] == 1.5

In particular, for LR schedulers, initializing with a constant can lead to recompilations:

.. code-block:: py

    import torch

    mod = torch.nn.Linear(3, 3)
    opt = torch.optim.Adam(mod.parameters(), lr=0.01)
    sched = torch.optim.lr_scheduler.ExponentialLR(opt, 0.9)

    @torch.compile
    def fn(inp):
        opt.zero_grad(True)
        out = mod(inp).sum()
        out.backward()
        opt.step()
        sched.step()

    for i in range(1, 10):
        fn(torch.ones(3, 3))

::

    $ TORCH_LOGS="recompiles" python playground.py
    Recompiling function step in /data/users/williamwen/pytorch/torch/optim/adam.py:189
        triggered by the following guard failure(s):
        - 3/7: L['self'].param_groups[0]['lr'] == 0.004782969000000002
        - 3/6: L['self'].param_groups[0]['lr'] == 0.005314410000000002
        - 3/5: L['self'].param_groups[0]['lr'] == 0.005904900000000002
        - 3/4: L['self'].param_groups[0]['lr'] == 0.006561000000000002
        - 3/3: L['self'].param_groups[0]['lr'] == 0.007290000000000001
        - 3/2: L['self'].param_groups[0]['lr'] == 0.008100000000000001
        - 3/1: L['self'].param_groups[0]['lr'] == 0.009000000000000001
        - 3/0: L['self'].param_groups[0]['lr'] == 0.01
    torch._dynamo hit config.cache_size_limit (8)
        function: 'step' (/data/users/williamwen/pytorch/torch/optim/adam.py:189)
        last reason: 3/0: L['self'].param_groups[0]['lr'] == 0.01

In both examples, we can wrap float variables in tensors in order to prevent recompilations.

.. code-block:: py

    # first example
    for i in range(1, 10):
        fn(torch.ones(i), torch.tensor(0.5 + i))

    # second example
    opt = torch.optim.Adam(mod.parameters(), lr=torch.tensor(0.01))
    sched = torch.optim.lr_scheduler.ExponentialLR(opt, torch.tensor(0.9))

Reporting Issues
~~~~~~~~~~~~~~~~

If the workarounds provided above were not enough to get ``torch.compile`` working,
then you should consider reporting the issue to PyTorch.
But there are a few things that you can do to make our lives significantly easier.

Ablation
--------

Check which component of the ``torch.compile`` stack is the one causing the issue using the ``backend=`` option for ``torch.compile``.
In particular, try:

- ``torch.compile(fn, backend="eager")``, which only runs TorchDynamo, the graph capture component of ``torch.compile``.
- ``torch.compile(fn, backend="aot_eager")``, which runs TorchDynamo and AOTAutograd, which additionally generates the backward graph during compilation.
- ``torch.compile(fn, backend="aot_eager_decomp_partition")``, which runs TorchDynamo and AOTAutograd with operator decompositions/partitions.
- ``torch.compile(fn, backend="inductor")``, which runs TorchDynamo, AOTAutograd, and TorchInductor, the backend ML compiler that generates compiled kernels.

If you only fail with the Inductor backend, you can additionally test various Inductor modes:

- ``torch.compile(fn, backend="inductor", mode="default")``
- ``torch.compile(fn, backend="inductor", mode="reduce-overhead")``
- ``torch.compile(fn, backend="inductor", mode="max-autotune")``

You can also check if dynamic shapes is causing issues with any backend:

- ``torch.compile(fn, dynamic=True)`` (always use dynamic shapes)
- ``torch.compile(fn, dynamic=False)`` (never use dynamic shapes)
- ``torch.compile(fn, dynamic=None)`` (automatic dynamic shapes)

Bisecting
---------
Did you try on the latest nightly? Did something work in the past but now no longer works?
Can you bisect to determine the first nightly where your issue occurs?
Bisecting is especially helpful for performance, accuracy, or compile time regressions,
where it is not immediately obvious where the problem originates from.

Creating a reproducer
---------------------

Creating reproducers is a lot of work, and it is perfectly fine if you do not have the time to do it.
However, if you are a motivated user unfamiliar with the internals of ``torch.compile``,
creating a standalone reproducer can have a huge impact on our ability to fix the bug.
Without a reproducer, your bug report must contain enough information for us to identify the root cause of the problem and write a reproducer from scratch.

Here's a list of useful reproducers, ranked from most to least preferred:

1. **Self-contained, small reproducer:** A script with no external dependencies, under 100 lines of code, that reproduces the problem when run.
2. **Self-contained, large reproducer:** Even if it's large, being self-contained is a huge advantage!
3. **Non-self-contained reproducer with manageable dependencies:**
   For example, if you can reproduce the problem by running a script after ``pip install transformers``,
   that's manageable. We can likely run it and investigate.
4. **Non-self-contained reproducer requiring substantial setup:**  This might involve downloading datasets,
   multiple environment setup steps, or specific system library versions requiring a Docker image.
   The more complex the setup, the harder it is for us to recreate the environment.

   .. note::
       Docker simplifies setup but complicates changes to the environment, so it's not a perfect solution, though we'll use it if necessary.

Somewhat orthogonally, a reproducer that can be run in a single process is better than a reproducer
that requires multiprocess training (but once again, if you only have a multiprocess reproducer, we'll take it!).

Additionally, below is a non-exhaustive list of aspects to check in your
issue that you can attempt to replicate in your reproducer:

- **Autograd**. Did you have tensor inputs with ``requires_grad=True``? Did you call ``backward()`` on the output?
- **Dynamic shapes**. Did you set ``dynamic=True``? Or did you run the test code multiple times with varying shapes?
- **Custom operators**. Is there a custom operator involved in the real workflow?
  Can you replicate some of its important characteristics using the Python custom operator API?
- **Configuration**. Did you set all the same configuration?
  This includes ``torch._dynamo.config`` and ``torch._inductor.config`` settings,
  as well as arguments to ``torch.compile`` like ``backend`` / ``mode``.
- **Context managers**. Did you replicate any active context managers?
  This could be ``torch.no_grad``, automatic mixed precision, ``TorchFunctionMode`` / ``TorchDispatchMode``,
  activation checkpointing, compiled autograd etc.
- **Tensor subclasses**. Is there a tensor subclass involved?

Minifier
--------

The minifier is an early ``torch.compile`` tool that, given an FX graph that crashes when we attempt to run or compile it,
finds a subgraph that also crashes and outputs the code that performs that subgraph's operations.
Essentially, the minifier finds a minimal repro for a certain class of ``torch.compile``-related crashes.
This assumes that we were able to successfully trace through code.

Unfortunately, most of the time nowadays, the minifier doesn't work as expected, and alternative methods may be necessary.
This is likely because bugs that can be automatically reproduced in this manner are generally easier to fix
and have already been addressed, leaving more complex issues that do not reproduce easily.
However, it is straightforward to attempt using the minifier, so it is worth trying even if it may not succeed.

Instructions for operating the minifier can be found `here <https://pytorch.org/docs/stable/torch.compiler_troubleshooting_old.html>`__.
If the compiler is crashing, you can set ``TORCHDYNAMO_REPRO_AFTER="dynamo"`` or ``TORCHDYNAMO_REPRO_AFTER="aot"``
The ``aot`` option is more likely to succeed, although it may not identify the ``AOTAutograd`` issues.  This will generate the ``repro.py`` file which may help to diagnose the problem.
For accuracy-related issues, consider setting ``TORCHDYNAMO_REPRO_LEVEL=4``. Please note that this may not always successfully identify the problematic subgraph.

Debugging Deeper
~~~~~~~~~~~~~~~~

This section provides tools and techniques for independently debugging ``torch.compile`` issues
or for gaining a deeper understanding of the ``torch.compile`` stack.
These methods are more involved than those presented above and are used by PyTorch developers regularly
to debug real ``torch.compile`` issues.

Below is a high-level overview of the stack:

.. image:: _static/img/dynamo/td_stack.png

The stack comprises three main components: TorchDynamo, AOTAutograd, and Inductor.
Our debugging strategy involves first identifying the component in which the error occurs
and then individually debugging the component. To determine the component responsible for the issue,
see the `Ablation` section under `Reporting Issues` above. For guidance on debugging a specific component, consult the sections below.

TorchDynamo
-----------

Logging what Dynamo is tracing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``TORCH_LOGS=trace_bytecode`` option enables you to view the precise bytecode instructions that Dynamo is tracing,
as well as a symbolic representation of the Python interpreter stack. When encountering a graph break or crash,
it is advisable to inspect the last few bytecode instructions traced.

You can also use ``TORCH_LOGS=trace_source`` to see which lines of source code Dynamo is tracing through.
This is useful in combination with ``trace_bytecode`` to see the line of source code each traced bytecode instruction corresponds to.

Finally, you can use ``TORCH_LOGS=graph_code`` to see the Python code representing the FX graph that Dynamo traced.
You can view this code to double check that the correct ops are being traced.

.. code-block:: py

    import torch

    def g(x, y):
        return x + y

    @torch.compile(backend="eager")
    def f(x):
        x = torch.sin(x)
        x = g(x, x)
        return x

    f(torch.ones(3, 3))

::

    $ TORCH_LOGS="trace_bytecode,trace_source,graph_code" python playground.py
    TRACE starts_line /data/users/williamwen/pytorch/playground.py:6 in f ()
        @torch.compile(backend="eager")
    TRACE RESUME 0 []
    TRACE starts_line /data/users/williamwen/pytorch/playground.py:8 in f (f)
            x = torch.sin(x)
    TRACE LOAD_GLOBAL torch []
    TRACE LOAD_ATTR sin [NullVariable(), PythonModuleVariable(<module 'torch' from '/data/users/williamwen/pytorch/torch/__init__.py'>)]
    TRACE LOAD_FAST x [NullVariable(), TorchInGraphFunctionVariable(<built-in method sin of type object at 0x7f00f6964600>)]
    TRACE CALL 1 [NullVariable(), TorchInGraphFunctionVariable(<built-in method sin of type object at 0x7f00f6964600>), LazyVariableTracker()]
    TRACE STORE_FAST x [TensorVariable()]
    TRACE starts_line /data/users/williamwen/pytorch/playground.py:9 in f (f)
            x = g(x, x)
    TRACE LOAD_GLOBAL g []
    TRACE LOAD_FAST x [NullVariable(), UserFunctionVariable()]
    TRACE LOAD_FAST x [NullVariable(), UserFunctionVariable(), TensorVariable()]
    TRACE CALL 2 [NullVariable(), UserFunctionVariable(), TensorVariable(), TensorVariable()]
    TRACE starts_line /data/users/williamwen/pytorch/playground.py:3 in g (g) (inline depth: 1)
        def g(x, y):
    TRACE RESUME 0 []
    TRACE starts_line /data/users/williamwen/pytorch/playground.py:4 in g (g) (inline depth: 1)
            return x + y
    TRACE LOAD_FAST x []
    TRACE LOAD_FAST y [TensorVariable()]
    TRACE BINARY_OP 0 [TensorVariable(), TensorVariable()]
    TRACE RETURN_VALUE None [TensorVariable()]
    TRACE STORE_FAST x [TensorVariable()]
    TRACE starts_line /data/users/williamwen/pytorch/playground.py:10 in f (f)
            return x
    TRACE LOAD_FAST x []
    TRACE RETURN_VALUE None [TensorVariable()]
    TRACED GRAPH
    ===== __compiled_fn_1 =====
    /data/users/williamwen/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
        def forward(self, L_x_: "f32[3, 3][3, 1]cpu"):
            l_x_ = L_x_

            # File: /data/users/williamwen/pytorch/playground.py:8 in f, code: x = torch.sin(x)
            x: "f32[3, 3][3, 1]cpu" = torch.sin(l_x_);  l_x_ = None

            # File: /data/users/williamwen/pytorch/playground.py:4 in g, code: return x + y
            x_1: "f32[3, 3][3, 1]cpu" = x + x;  x = None
            return (x_1,)

Breakpointing Dynamo tracing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Inserting a breakpoint in Dynamo/user code is helpful at times to see what the state of Dynamo is when tracing through user code.
Unfortunately, inserting a breakpoint in the normal Python fashion will result in a graph break in TorchDynamo,
so we will not be able to view the state of Dynamo at the point where we intended to breakpoint.

The first method for setting a breakpoint is to insert it within the Dynamo source code. Three recommended locations to place a breakpoint are:

- In ``torch/_dynamo/symbolic_convert.py``, breakpoint at functions that are named after the problematic bytecode instruction,
  such as ``def CALL_FUNCTION`` and ``def STORE_ATTR``. You can conditionally breakpoint depending on inputs,
  for example, the ``argval`` of the instruction, or the name of the object at the top of the stack since some bytecode opcodes are frequently used.
- Breakpoint where the graph break or error originates from. Typically, graph breaks are emitted from a call to ``unimplemented(...)``.
- Breakpoint in ``torch/_dynamo/variables/builder.py, function:_wrap``. You will likely have to conditionally breakpoint on the input.
  This function determines how to symbolically represent a given value. Consider breakpointing here if you suspect that a value is represented incorrectly.

The second way to insert a breakpoint is to use ``torch._dynamo.comptime.comptime.breakpoint``:

.. code-block:: py

    from torch._dynamo.comptime import comptime

    @torch.compile
    def f(...):
        ...
        comptime.breakpoint()
        ...

A comptime breakpoint is convenient as it enables you to inspect the Dynamo state at a specific location within the user code being traced.
It does not require you to insert a breakpoint in the Dynamo source or to conditionally breakpoint based on variables.

When a comptime breakpoint is triggered, you can do the following:

- ``ctx.print_bt()`` to print the user stack trace
- ``ctx.print_locals()`` to print all current locals
- ``ctx.print_graph()`` to print the currently traced graph
- ``ctx.disas()`` to print the currently traced function's bytecode
- Use standard ``pdb`` commands, such as ``bt/u/d/n/s/r``, - you can go up the ``pdb`` stack to inspect more Dynamo internals

.. code-block:: py

    import torch
    from torch._dynamo.comptime import comptime

    @torch.compile(backend="eager")
    def f(x):
        y = x + 1
        comptime.breakpoint()
        y = y + 1
        return y

    f(torch.ones(3, 3))

::

    $ python playground.py
    --Return--
    > /data/users/williamwen/pytorch/torch/_dynamo/comptime.py(392)inner()->None
    -> builtins.breakpoint()
    (Pdb) ctx.print_bt()
    File "/data/users/williamwen/pytorch/playground.py", line 7, in f
        comptime.breakpoint()

    (Pdb) ctx.print_locals()
    x = FakeTensor(..., size=(3, 3))
    y = FakeTensor(..., size=(3, 3))
    (Pdb) bt
    ...
    /data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py(826)call_function()
    -> self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
    /data/users/williamwen/pytorch/torch/_dynamo/variables/misc.py(331)call_function()
    -> func(ComptimeContext(tx))
    > /data/users/williamwen/pytorch/torch/_dynamo/comptime.py(392)inner()->None
    -> builtins.breakpoint()
    (Pdb) ctx.print_graph()



    def forward(self, L_x_: "f32[3, 3]"):
        l_x_ = L_x_

        # File: /data/users/williamwen/pytorch/playground.py:6 in f, code: y = x + 1
        y: "f32[3, 3]" = l_x_ + 1;  l_x_ = y = None

..
    TODO(uncomment/update once we improve this API)
    Debugging large models
    ^^^^^^^^^^^^^^^^^^^^^^

    Debugging TorchDynamo on large models can be tricky, mainly because Dynamo traces through large amounts of code.
    It can be difficult to find the problematic function, or to determine where to place a breakpoint.
    Even if we've found the problematic function, we don't want to deal with logging spam.
    Fortunately, you can use ``TORCHDYNAMO_DEBUG_FUNCTION=<function name>``, which limits dynamo tracing to only functions with a specific name
    (exact match). This will allow you to filter all of the functions in the model to the function(s) of interest.
    Use this in combination with the above debugging strategies.

Bytecode generation errors
^^^^^^^^^^^^^^^^^^^^^^^^^^

Although uncommon, Dynamo may generate incorrect bytecode. This may occur if you determine the following:

- Ablation reveals the error is happening at the TorchDynamo level
- The error is not being emitted from TorchDynamo stack frames
- The error looks more like a user error rather than a Dynamo error, or is a segmentation fault
- The error does not occur without ``torch.compile``

Bytecode generation bugs are generally tricky to fix and we recommend submitting an issue instead of trying to fix those yourself.
If you are interested in seeing the bytecode that Dynamo generates, you can use ``TORCH_LOGS=bytecode``.
You can see a high-level overview on what bytecode Dynamo generates `here <https://docs.google.com/presentation/d/1tMZOoAoNKF32CAm1C-WfzdVVgoEvJ3lp/edit?usp=sharing&ouid=114922067987692817315&rtpof=true&sd=true>`__.

AOTAutograd
-----------

AOTAutograd errors are typically difficult to debug - we recommend just submitting an issue.
AOTAutograd logging output is primarily helpful to see what the input to Inductor is.

..
    TODO
    TorchInductor
    -------------

.. TODO

.. _troubleshooting_torch_logs_options:

Summary of TORCH_LOGS options
---------------------------------

A summary of helpful ``TORCH_LOGS`` options is:

.. list-table::
    :widths: 25 50
    :header-rows: 1

    * - Option
      - Description
    * - +all
      - Output debug logs from all ``torch.compile`` components
    * - +dynamo
      - Output debug logs from TorchDynamo
    * - +aot
      - Output debug logs from AOTAutograd
    * - +inductor
      - Output debug logs from TorchInductor
    * - dynamic
      - Output logs from dynamic shapes
    * - graph_code
      - Output the Python code for the FX graph that Dynamo generated
    * - graph_sizes
      - Output the tensor sizes of the FX graph that Dynamo generated
    * - trace_bytecode
      - Output the bytecode instructions that Dynamo is tracing through and the symbolic interpreter stack Dynamo is keeping track of
    * - trace_source
      - Output the line of code in the original source that Dynamo is currently tracing through
    * - bytecode
      - Output Dynamo-generated bytecode
    * - guards
      - Output generated guards
    * - recompiles
      - Output recompilation reasons (only the first guard check that fails)
    * - recompiles_verbose
      - Output all guard checks that fail when a recompilation occurs
    * - aot_graphs
      - Output graph generated by AOTAutograd
    * - aot_joint_graphs
      - Output the joint forward-backward graph generated by AOTAutograd
    * - output_code
      - Output code generated by Inductor
    * - kernel_code
      - Output code generated by Inductor on a per-kernel basis
    * - schedule
      - Output Inductor scheduling logs
    * - perf_hints
      - Output Inductor perf hint logs
    * - fusion
      - Output Inductor fusion logs

For the full list of options, see `torch._logging <https://pytorch.org/docs/stable/logging.html>`__
and `torch._logging.set_logs <https://pytorch.org/docs/stable/generated/torch._logging.set_logs.html#torch._logging.set_logs>`__.

Related Articles
~~~~~~~~~~~~~~~~

- `torch.compile tutorial <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
- `torch.compile fine-grained APIs <https://pytorch.org/docs/stable/torch.compiler_fine_grain_apis.html>`__
- `torch.compile FAQ <https://pytorch.org/docs/stable/torch.compiler_faq.html>`__
- `torch.compiler namespace overview <https://pytorch.org/docs/stable/torch.compiler.html#torch-compiler-overview>`__
- `torch.compiler API reference <https://pytorch.org/docs/stable/torch.compiler_api.html>`__
- `Profiling torch.compile <https://pytorch.org/docs/stable/torch.compiler_profiling_torch_compile.html>`__
- `torch.compile missing manual <https://docs.google.com/document/d/1y5CRfMLdwEoF1nTk9q8qEu1mgMUuUtvhklPKJ2emLU8/edit?usp=sharing>`__
- `The dynamic shapes manual <https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.fh8zzonyw8ng>`__
- `TorchInductor caching tutorial <https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html>`__