File: generators.rst

package info (click to toggle)
numba 0.61.2%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 17,316 kB
  • sloc: python: 211,580; ansic: 15,233; cpp: 6,544; javascript: 424; sh: 322; makefile: 173
file content (307 lines) | stat: -rw-r--r-- 11,324 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307

.. _arch-generators:

===================
Notes on generators
===================

Numba recently gained support for compiling generator functions.  This
document explains some of the implementation choices.


Terminology
===========

For clarity, we distinguish between *generator functions* and
*generators*.  A generator function is a function containing one or
several ``yield`` statements.  A generator (sometimes also called "generator
iterator") is the return value of a generator function; it resumes
execution inside its frame each time :py:func:`next` is called.

A *yield point* is the place where a ``yield`` statement is called.
A *resumption point* is the place just after a *yield point* where execution
is resumed when :py:func:`next` is called again.


Function analysis
=================

Suppose we have the following simple generator function::

   def gen(x, y):
       yield x + y
       yield x - y

Here is its CPython bytecode, as printed out using :py:func:`dis.dis`::

  7           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 BINARY_ADD
              7 YIELD_VALUE
              8 POP_TOP

  8           9 LOAD_FAST                0 (x)
             12 LOAD_FAST                1 (y)
             15 BINARY_SUBTRACT
             16 YIELD_VALUE
             17 POP_TOP
             18 LOAD_CONST               0 (None)
             21 RETURN_VALUE

When compiling this function with :envvar:`NUMBA_DUMP_IR` set to 1, the
following information is printed out::

   ----------------------------------IR DUMP: gen----------------------------------
   label 0:
       x = arg(0, name=x)                       ['x']
       y = arg(1, name=y)                       ['y']
       $0.3 = x + y                             ['$0.3', 'x', 'y']
       $0.4 = yield $0.3                        ['$0.3', '$0.4']
       del $0.4                                 []
       del $0.3                                 []
       $0.7 = x - y                             ['$0.7', 'x', 'y']
       del y                                    []
       del x                                    []
       $0.8 = yield $0.7                        ['$0.7', '$0.8']
       del $0.8                                 []
       del $0.7                                 []
       $const0.9 = const(NoneType, None)        ['$const0.9']
       $0.10 = cast(value=$const0.9)            ['$0.10', '$const0.9']
       del $const0.9                            []
       return $0.10                             ['$0.10']
   ------------------------------GENERATOR INFO: gen-------------------------------
   generator state variables: ['$0.3', '$0.7', 'x', 'y']
   yield point #1: live variables = ['x', 'y'], weak live variables = ['$0.3']
   yield point #2: live variables = [], weak live variables = ['$0.7']


What does it mean? The first part is the Numba IR, as already seen in
:ref:`arch_generate_numba_ir`.  We can see the two yield points (``yield $0.3``
and ``yield $0.7``).

The second part shows generator-specific information.  To understand it
we have to understand what suspending and resuming a generator means.

When suspending a generator, we are not merely returning a value to the
caller (the operand of the ``yield`` statement).  We also have to save the
generator's *current state* in order to resume execution.  In trivial use
cases, perhaps the CPU's register values or stack slots would be preserved
until the next call to next().  However, any non-trivial case will hopelessly
clobber those values, so we have to save them in a well-defined place.

What are the values we need to save?  Well, in the context of the Numba
Intermediate Representation, we must save all *live variables* at each
yield point.  These live variables are computed thanks to the control
flow graph.

Once live variables are saved and the generator is suspended, resuming
the generator simply involves the inverse operation: the live variables
are restored from the saved generator state.

.. note::
   It is the same analysis which helps insert Numba ``del`` instructions
   where appropriate.

Let's go over the generator info again::

   generator state variables: ['$0.3', '$0.7', 'x', 'y']
   yield point #1: live variables = ['x', 'y'], weak live variables = ['$0.3']
   yield point #2: live variables = [], weak live variables = ['$0.7']

Numba has computed the union of all live variables (denoted as "state
variables").  This will help define the layout of the :ref:`generator
structure <generator-structure>`.  Also, for each yield point, we have
computed two sets of variables:

* the *live variables* are the variables which are used by code following
  the resumption point (i.e. after the ``yield`` statement)

* the *weak live variables* are variables which are del'ed immediately
  after the resumption point; they have to be saved in :term:`object mode`,
  to ensure proper reference cleanup


.. _generator-structure:

The generator structure
=======================

Layout
------

Function analysis helps us gather enough information to define the
layout of the generator structure, which will store the entire execution
state of a generator.  Here is a sketch of the generator structure's layout,
in pseudo-code::

   struct gen_struct_t {
      int32_t resume_index;
      struct gen_args_t {
         arg_0_t arg0;
         arg_1_t arg1;
         ...
         arg_N_t argN;
      }
      struct gen_state_t {
         state_0_t state_var0;
         state_1_t state_var1;
         ...
         state_N_t state_varN;
      }
   }

Let's describe those fields in order.

* The first member, the *resume index*, is an integer telling the generator
  at which resumption point execution must resume.  By convention, it can
  have two special values: 0 means execution must start at the beginning of
  the generator (i.e. the first time :py:func:`next` is called); -1 means
  the generator is exhausted and resumption must immediately raise
  StopIteration.  Other values indicate the yield point's index starting from 1
  (corresponding to the indices shown in the generator info above).

* The second member, the *arguments structure* is read-only after it is first
  initialized.  It stores the values of the arguments the generator function
  was called with.  In our example, these are the values of ``x`` and ``y``.

* The third member, the *state structure*, stores the live variables as
  computed above.

Concretely, our example's generator structure (assuming the generator
function is called with floating-point numbers) is then::

   struct gen_struct_t {
      int32_t resume_index;
      struct gen_args_t {
         double arg0;
         double arg1;
      }
      struct gen_state_t {
         double $0.3;
         double $0.7;
         double x;
         double y;
      }
   }

Note that here, saving ``x`` and ``y`` is redundant: Numba isn't able to
recognize that the state variables ``x`` and ``y`` have the same value
as ``arg0`` and ``arg1``.

Allocation
----------

How does Numba ensure the generator structure is preserved long enough?
There are two cases:

* When a Numba-compiled generator function is called from a Numba-compiled
  function, the structure is allocated on the stack by the callee.  In this
  case, generator instantiation is practically costless.

* When a Numba-compiled generator function is called from regular Python
  code, a CPython-compatible wrapper is instantiated that has the right
  amount of allocated space to store the structure, and whose
  :c:member:`~PyTypeObject.tp_iternext` slot is a wrapper around the
  generator's native code.


Compiling to native code
========================

When compiling a generator function, three native functions are actually
generated by Numba:

* An initialization function.  This is the function corresponding
  to the generator function itself: it receives the function arguments and
  stores them inside the generator structure (which is passed by pointer).
  It also initialized the *resume index* to 0, indicating that the generator
  hasn't started yet.

* A next() function.  This is the function called to resume execution
  inside the generator.  Its single argument is a pointer to the generator
  structure and it returns the next yielded value (or a special exit code
  is used if the generator is exhausted, for quick checking when called
  from Numba-compiled functions).

* An optional finalizer.  In object mode, this function ensures that all
  live variables stored in the generator state are decref'ed, even if the
  generator is destroyed without having been exhausted.

The next() function
-------------------

The next() function is the least straight-forward of the three native
functions.  It starts with a trampoline which dispatches execution to the
right resume point depending on the *resume index* stored in the generator
structure.  Here is how the function start may look like in our example:

.. code-block:: llvm

   define i32 @"__main__.gen.next"(
      double* nocapture %retptr,
      { i8*, i32 }** nocapture readnone %excinfo,
      i8* nocapture readnone %env,
      { i32, { double, double }, { double, double, double, double } }* nocapture %arg.gen)
   {
     entry:
        %gen.resume_index = getelementptr { i32, { double, double }, { double, double, double, double } }* %arg.gen, i64 0, i32 0
        %.47 = load i32* %gen.resume_index, align 4
        switch i32 %.47, label %stop_iteration [
          i32 0, label %B0
          i32 1, label %generator_resume1
          i32 2, label %generator_resume2
        ]

     ; rest of the function snipped

(uninteresting stuff trimmed from the LLVM IR to make it more readable)

We recognize the pointer to the generator structure in ``%arg.gen``.
The trampoline switch has three targets (one for each *resume index* 0, 1
and 2), and a fallback target label named ``stop_iteration``.  Label ``B0``
represents the function's start, ``generator_resume1`` (resp.
``generator_resume2``) is the resumption point after the first
(resp. second) yield point.

After generation by LLVM, the whole native assembly code for this function
may look like this (on x86-64):

.. code-block:: asm

           .globl  __main__.gen.next
           .align  16, 0x90
   __main__.gen.next:
           movl    (%rcx), %eax
           cmpl    $2, %eax
           je      .LBB1_5
           cmpl    $1, %eax
           jne     .LBB1_2
           movsd   40(%rcx), %xmm0
           subsd   48(%rcx), %xmm0
           movl    $2, (%rcx)
           movsd   %xmm0, (%rdi)
           xorl    %eax, %eax
           retq
   .LBB1_5:
           movl    $-1, (%rcx)
           jmp     .LBB1_6
   .LBB1_2:
           testl   %eax, %eax
           jne     .LBB1_6
           movsd   8(%rcx), %xmm0
           movsd   16(%rcx), %xmm1
           movaps  %xmm0, %xmm2
           addsd   %xmm1, %xmm2
           movsd   %xmm1, 48(%rcx)
           movsd   %xmm0, 40(%rcx)
           movl    $1, (%rcx)
           movsd   %xmm2, (%rdi)
           xorl    %eax, %eax
           retq
   .LBB1_6:
           movl    $-3, %eax
           retq

Note the function returns 0 to indicate a value is yielded, -3 to indicate
StopIteration. ``%rcx`` points to the start of the generator structure,
where the resume index is stored.