1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447
|
# A higher level definition of the bytecode interpreter
## Abstract
The CPython interpreter is defined in C, meaning that the semantics of the
bytecode instructions, the dispatching mechanism, error handling, and
tracing and instrumentation are all intermixed.
This document proposes defining a custom C-like DSL for defining the
instruction semantics and tools for generating the code deriving from
the instruction definitions.
These tools would be used to:
* Generate the main interpreter (done)
* Generate the tier 2 interpreter
* Generate documentation for instructions
* Generate metadata about instructions, such as stack use (done).
* Generate the tier 2 optimizer's abstract interpreter.
Having a single definition file ensures that there is a single source
of truth for bytecode semantics.
Other tools that operate on bytecodes, like `frame.setlineno`
and the `dis` module, will be derived from the common semantic
definition, reducing errors.
## Motivation
The bytecode interpreter of CPython has traditionally been defined as standard
C code, but with a lot of macros.
The presence of these macros and the nature of bytecode interpreters means
that the interpreter is effectively defined in a domain specific language (DSL).
Rather than using an ad-hoc DSL embedded in the C code for the interpreter,
a custom DSL should be defined and the semantics of the bytecode instructions,
and the instructions defined in that DSL.
Generating the interpreter decouples low-level details of dispatching
and error handling from the semantics of the instructions, resulting
in more maintainable code and a potentially faster interpreter.
It also provides the ability to create and check optimizers and optimization
passes from the semantic definition, reducing errors.
## Rationale
As we improve the performance of CPython, we need to optimize larger regions
of code, use more complex optimizations and, ultimately, translate to machine
code.
All of these steps introduce the possibility of more bugs, and require more code
to be written. One way to mitigate this is through the use of code generators.
Code generators decouple the debugging of the code (the generator) from checking
the correctness (the DSL input).
For example, we are likely to want a new interpreter for the tier 2 optimizer
to be added in 3.12. That interpreter will have a different API, a different
set of instructions and potentially different dispatching mechanism.
But the instructions it will interpret will be built from the same building
blocks as the instructions for the tier 1 (PEP 659) interpreter.
Rewriting all the instructions is tedious and error-prone, and changing the
instructions is a maintenance headache as both versions need to be kept in sync.
By using a code generator and using a common source for the instructions, or
parts of instructions, we can reduce the potential for errors considerably.
## Specification
This specification is a work in progress.
We update it as the need arises.
### Syntax
Each op definition has a kind, a name, a stack and instruction stream effect,
and a piece of C code describing its semantics:
```
file:
(definition | family | pseudo)+
definition:
"inst" "(" NAME "," stack_effect ")" "{" C-code "}"
|
"op" "(" NAME "," stack_effect ")" "{" C-code "}"
|
"macro" "(" NAME ")" "=" uop ("+" uop)* ";"
stack_effect:
"(" [inputs] "--" [outputs] ")"
inputs:
input ("," input)*
outputs:
output ("," output)*
input:
object | stream | array
output:
object | array
uop:
NAME | stream
object:
NAME [":" type] [ "if" "(" C-expression ")" ]
type:
NAME ["*"]
stream:
NAME "/" size
size:
INTEGER
array:
object "[" C-expression "]"
family:
"family" "(" NAME ")" = "{" NAME ("," NAME)+ [","] "}" ";"
pseudo:
"pseudo" "(" NAME "," stack_effect ["," "(" flags ")"]")" = "{" NAME ("," NAME)+ [","] "}" ";"
flags:
flag ("|" flag)*
flag:
HAS_ARG | HAS_DEOPT | etc..
```
The following definitions may occur:
* `inst`: A normal instruction, as previously defined by `TARGET(NAME)` in `ceval.c`.
* `op`: A part instruction from which macros can be constructed.
* `macro`: A bytecode instruction constructed from ops and cache effects.
`NAME` can be any ASCII identifier that is a C identifier and not a C or Python keyword.
`foo_1` is legal. `$` is not legal, nor is `struct` or `class`.
The optional `type` in an `object` is the C type. It defaults to `PyObject *`.
The objects before the "--" are the objects on top of the stack at the start of
the instruction. Those after the "--" are the objects on top of the stack at the
end of the instruction.
An `inst` without `stack_effect` is a transitional form to allow the original C code
definitions to be copied. It lacks information to generate anything other than the
interpreter, but is useful for initial porting of code.
Stack effect names may be `unused`, indicating the space is to be reserved
but no use of it will be made in the instruction definition.
This is useful to ensure that all instructions in a family have the same
stack effect.
The number in a `stream` define how many codeunits are consumed from the
instruction stream. It returns a 16, 32 or 64 bit value.
If the name is `unused` the size can be any value and that many codeunits
will be skipped in the instruction stream.
By convention cache effects (`stream`) must precede the input effects.
The name `oparg` is pre-defined as a 32 bit value fetched from the instruction stream.
### Special instruction annotations
Instruction headers may be prefixed by one or more annotations. The non-exhaustive
list of annotations and their meanings are as follows:
* `override`. For external use by other interpreter definitions to override the current
instruction definition.
* `pure`. This instruction has no side effects.
* 'tierN'. This instruction is only used by the tier N interpreter.
### Special functions/macros
The C code may include special functions and macros that are understood by the tools as
part of the DSL.
Those include:
* `DEOPT_IF(cond, instruction)`. Deoptimize if `cond` is met.
* `ERROR_IF(cond)`. Jump to error handler if `cond` is true.
* `DECREF_INPUTS()`. Generate `Py_DECREF()` calls for the input stack effects.
* `SYNC_SP()`. Synchronizes the physical stack pointer with the stack effects.
* `INSTRUCTION_SIZE`. Replaced with the size of the instruction which is equal
to `1 + INLINE_CACHE_ENTRIES`.
Note that the use of `DECREF_INPUTS()` is optional -- manual calls
to `Py_DECREF()` or other approaches are also acceptable
(e.g. calling an API that "steals" a reference).
Variables can either be defined in the input, output, or in the C code.
Variables defined in the input may not be assigned in the C code.
If an `ERROR_IF` occurs, all values will be removed from the stack;
they must already be `DECREF`'ed by the code block.
If a `DEOPT_IF` occurs, no values will be removed from the stack or
the instruction stream; no values must have been `DECREF`'ed or created.
These requirements result in the following constraints on the use of
`DEOPT_IF` and `ERROR_IF` in any instruction's code block:
1. Until the last `DEOPT_IF`, no objects may be allocated, `INCREF`ed,
or `DECREF`ed.
2. Before the first `ERROR_IF`, all input values must be `DECREF`ed,
and no objects may be allocated or `INCREF`ed, with the exception
of attempting to create an object and checking for success using
`ERROR_IF(result == NULL)`. (TODO: Unclear what to do with
intermediate results.)
3. No `DEOPT_IF` may follow an `ERROR_IF` in the same block.
(There is some wiggle room: these rules apply to dynamic code paths,
not to static occurrences in the source code.)
If code detects an error condition before the first `DECREF` of an input,
two idioms are valid:
- Use `goto error`.
- Use a block containing the appropriate `DECREF` calls ending in
`ERROR_IF(true)`.
An example of the latter would be:
```cc
res = PyObject_Add(left, right);
if (res == NULL) {
DECREF_INPUTS();
ERROR_IF(true);
}
```
### Semantics
The underlying execution model is a stack machine.
Operations pop values from the stack, and push values to the stack.
They also can look at, and consume, values from the instruction stream.
All members of a family
(which represents a specializable instruction and its specializations)
must have the same stack and instruction stream effect.
The same is true for all members of a pseudo instruction
(which is mapped by the bytecode compiler to one of its members).
## Examples
(Another source of examples can be found in the
[tests](https://github.com/python/cpython/blob/main/Lib/test/test_generated_cases.py).)
Some examples:
### Output stack effect
```C
inst ( LOAD_FAST, (-- value) ) {
value = frame->f_localsplus[oparg];
Py_INCREF(value);
}
```
This would generate:
```C
TARGET(LOAD_FAST) {
PyObject *value;
value = frame->f_localsplus[oparg];
Py_INCREF(value);
PUSH(value);
DISPATCH();
}
```
### Input stack effect
```C
inst ( STORE_FAST, (value --) ) {
SETLOCAL(oparg, value);
}
```
This would generate:
```C
TARGET(STORE_FAST) {
PyObject *value = PEEK(1);
SETLOCAL(oparg, value);
STACK_SHRINK(1);
DISPATCH();
}
```
### Input stack effect and cache effect
```C
op ( CHECK_OBJECT_TYPE, (owner, type_version/2 -- owner) ) {
PyTypeObject *tp = Py_TYPE(owner);
assert(type_version != 0);
DEOPT_IF(tp->tp_version_tag != type_version);
}
```
This might become (if it was an instruction):
```C
TARGET(CHECK_OBJECT_TYPE) {
PyObject *owner = PEEK(1);
uint32 type_version = read32(next_instr);
PyTypeObject *tp = Py_TYPE(owner);
assert(type_version != 0);
DEOPT_IF(tp->tp_version_tag != type_version);
next_instr += 2;
DISPATCH();
}
```
### More examples
For explanations see "Generating the interpreter" below.
```C
op ( CHECK_HAS_INSTANCE_VALUES, (owner -- owner) ) {
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner);
DEOPT_IF(!_PyDictOrValues_IsValues(dorv));
}
```
```C
op ( LOAD_INSTANCE_VALUE, (owner, index/1 -- null if (oparg & 1), res) ) {
res = _PyDictOrValues_GetValues(dorv)->values[index];
DEOPT_IF(res == NULL);
Py_INCREF(res);
null = NULL;
Py_DECREF(owner);
}
```
```C
macro ( LOAD_ATTR_INSTANCE_VALUE ) =
counter/1 + CHECK_OBJECT_TYPE + CHECK_HAS_INSTANCE_VALUES +
LOAD_INSTANCE_VALUE + unused/4 ;
```
```C
op ( LOAD_SLOT, (owner, index/1 -- null if (oparg & 1), res) ) {
char *addr = (char *)owner + index;
res = *(PyObject **)addr;
DEOPT_IF(res == NULL);
Py_INCREF(res);
null = NULL;
Py_DECREF(owner);
}
```
```C
macro ( LOAD_ATTR_SLOT ) = counter/1 + CHECK_OBJECT_TYPE + LOAD_SLOT + unused/4;
```
```C
inst ( BUILD_TUPLE, (items[oparg] -- tuple) ) {
tuple = _PyTuple_FromArraySteal(items, oparg);
ERROR_IF(tuple == NULL);
}
```
```C
inst ( PRINT_EXPR ) {
PyObject *value = POP();
PyObject *hook = _PySys_GetAttr(tstate, &_Py_ID(displayhook));
PyObject *res;
if (hook == NULL) {
_PyErr_SetString(tstate, PyExc_RuntimeError,
"lost sys.displayhook");
Py_DECREF(value);
goto error;
}
res = PyObject_CallOneArg(hook, value);
Py_DECREF(value);
ERROR_IF(res == NULL);
Py_DECREF(res);
}
```
### Defining an instruction family
A _family_ maps a specializable instruction to its specializations.
Example: These opcodes all share the same instruction format:
```C
family(load_attr) = { LOAD_ATTR, LOAD_ATTR_INSTANCE_VALUE, LOAD_SLOT };
```
### Defining a pseudo instruction
A _pseudo instruction_ is used by the bytecode compiler to represent a set of possible concrete instructions.
Example: `JUMP` may expand to `JUMP_FORWARD` or `JUMP_BACKWARD`:
```C
pseudo(JUMP) = { JUMP_FORWARD, JUMP_BACKWARD };
```
## Generating the interpreter
The generated C code for a single instruction includes a preamble and dispatch at the end
which can be easily inserted. What is more complex is ensuring the correct stack effects
and not generating excess pops and pushes.
For example, in `CHECK_HAS_INSTANCE_VALUES`, `owner` occurs in the input, so it cannot be
redefined. Thus, it doesn't need to be written and can be read without adjusting the stack pointer.
The C code generated for `CHECK_HAS_INSTANCE_VALUES` would look something like:
```C
{
PyObject *owner = stack_pointer[-1];
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner);
DEOPT_IF(!_PyDictOrValues_IsValues(dorv));
}
```
When combining ops to form instructions, temporary values should be used,
rather than popping and pushing, such that `LOAD_ATTR_SLOT` would look something like:
```C
case LOAD_ATTR_SLOT: {
PyObject *s1 = stack_pointer[-1];
/* CHECK_OBJECT_TYPE */
{
PyObject *owner = s1;
uint32_t type_version = read32(next_instr + 1);
PyTypeObject *tp = Py_TYPE(owner);
assert(type_version != 0);
if (tp->tp_version_tag != type_version) goto deopt;
}
/* LOAD_SLOT */
{
PyObject *owner = s1;
uint16_t index = *(next_instr + 1 + 2);
char *addr = (char *)owner + index;
PyObject *null;
PyObject *res = *(PyObject **)addr;
if (res == NULL) goto deopt;
Py_INCREF(res);
null = NULL;
Py_DECREF(owner);
if (oparg & 1) {
stack_pointer[0] = null;
stack_pointer += 1;
}
s1 = res;
}
next_instr += (1 + 1 + 2 + 1 + 4);
stack_pointer[-1] = s1;
DISPATCH();
}
```
## Other tools
From the instruction definitions we can generate the stack marking code used in `frame.set_lineno()`,
and the tables for use by disassemblers.
|