File: byteplay_codetransformer.rst

package info (click to toggle)
python-bytecode 0.17.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 820 kB
  • sloc: python: 8,778; makefile: 169; sh: 40
file content (114 lines) | stat: -rw-r--r-- 4,288 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
++++++++++++++++++++++++++++++++++++++++++++
Comparison with byteplay and codetransformer
++++++++++++++++++++++++++++++++++++++++++++

History of the bytecode API design
==================================

The design of the bytecode module started with a single use case: reimplement
the CPython peephole optimizer (implemented in C) in pure Python. The design of
the API required many iterations to get the current API.

bytecode now has a clear separation between concrete instructions using integer
arguments and abstract instructions which use Python objects for arguments.
Jump targets are labels or basic blocks. And the control flow graph abstraction
is now an API well separated from the regular abstract bytecode which is a
simple list of instructions.


byteplay and codetransformer
============================

The `byteplay <https://github.com/serprex/byteplay>`_ and `codetransformer
<https://pypi.python.org/pypi/codetransformer>`_ are clear inspiration for the
design of the bytecode API. Sadly, byteplay and codetransformer API have design
issues (at least for my specific use cases).


Free and cell variables
-----------------------

Converting a code object to bytecode and then back to code must not modify the
code object. It is an important requirement.

The LOAD_DEREF instruction supports free variables and cell variables. byteplay
and codetransformer use a simple string for the variable name. When the
bytecode is converted to a code object, they check if the variable is a free
variable, or fallback to a cell variable.

The CPython code base contains a corner case: code having a free variable and a
cell variable with the same name. The heuristic produces invalid code which
can lead to a crash.

bytecode uses :class:`FreeVar` and :class:`CellVar` classes to tag the type of
the variable. Trying to use a simple string raise a :exc:`TypeError` in the
:class:`Instr` constructor.

.. note::
   It's possible to fix this issue in byteplay and codetransformer, maybe even
   with keeping support for simple string for free/cell variables for backward
   compatibility.


Line numbers
------------

codetransformer uses internally a dictionary mapping offsets to line numbers.
It is updated when the ``.steal()`` method is used.

byteplay uses a pseudo-instruction ``SetLineno`` to set the current line number
of the following instructions. It requires to handle these pseudo-instructions
when you modify the bytecode, especially when instructions are moved.

In FAT Python, some optimizations move instructions but their line numbers must
be kept. That's also why Python 3.6 was modified to support negative line
number delta in ``code.co_lntotab``.

bytecode has a different design: line numbers are stored directly inside
instructions (:attr:`Instr.lineno` attribute). Moving an instruction keeps
the line number information by design.

bytecode also supports the pseudo-instruction :class:`SetLineno`. It was added
to simplify functions emitting bytecode. It's not used when an existing code
object is converted to bytecode.


Jump targets
------------

In codetransformer, a jump target is an instruction. Jump targets are computed
when the bytecode is converted to a code object.

byteplay and bytecode use labels. Jump targets are computed when the abstract
bytecode is converted to a code object.

.. note::
   A loop is need in the conversion from bytecode to code: if the jump target
   is larger than 2**16, the size of the jump instruction changes (from 3 to 6
   bytes). So other jump targets must be recomputed.

   bytecode handles this corner case. byteplay and codetransformer don't, but
   it should be easy to fix them.


Control flow graph
------------------

The peephole optimizer has strong requirements on the control flow: an
optimization must not modify two instructions which are part of two different
basic blocks. Otherwise, the optimizer produces invalid code.

bytecode provides a control flow graph API for this use case.

byteplay and codetransformer don't.


Functions or methods
--------------------

This point is a matter of taste.

In bytecode, instructions are objects with methods like
:meth:`~Instr.is_final`, :meth:`~Instr.has_cond_jump`, etc.

The byteplay project uses functions taking an instruction as parameter.