File: 09-performance-overhead-of-using-decorators.md

package info (click to toggle)
python-wrapt 1.15.0-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,104 kB
  • sloc: python: 5,994; ansic: 2,354; makefile: 182; sh: 46
file content (265 lines) | stat: -rw-r--r-- 9,518 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
Performance overhead of using decorators
========================================

This is the ninth post in my series of blog posts about Python decorators
and how I believe they are generally poorly implemented. It follows on from
the previous post titled [The @synchronized decorator as context
manager](08-the-synchronized-decorator-as-context-manager.md), with the
very first post in the series being [How you implemented your Python
decorator is
wrong](01-how-you-implemented-your-python-decorator-is-wrong.md).

The posts so far in this series were bashed out in quick succession in a
bit over a week. Because that was quite draining on the brain and due to
other commitments I took a bit of a break. Hopefully I can get through
another burst of posts, initially about performance considerations when
implementing decorators and then start a dive into how to implement the
object proxy which underlies the function wrapper the decorator mechanism
described relies on.

Overhead in decorating a normal function
----------------------------------------

In this post I am only going to look at the overhead of decorating a normal
function with the decorator mechanism which has been described. The
relevant part of the decorator mechanism which comes into play in this case
is:

```python
class function_wrapper(object_proxy):

    def __init__(self, wrapped, wrapper):
        super(function_wrapper, self).__init__(wrapped)
        self.wrapper = wrapper
        ...

    def __get__(self, instance, owner):
        ...

    def __call__(self, *args, **kwargs):
        return self.wrapper(self.wrapped, None, args, kwargs)

def decorator(wrapper):
    def _wrapper(wrapped, instance, args, kwargs):
        def _execute(wrapped):
            if instance is None:
                return function_wrapper(wrapped, wrapper)
            elif inspect.isclass(instance):
                return function_wrapper(wrapped, wrapper.__get__(None, instance))
            else:
                return function_wrapper(wrapped, wrapper.__get__(instance, type(instance)))
        return _execute(*args, **kwargs)
    return function_wrapper(wrapper, _wrapper)
```

If you want to refresh your memory of the complete code that was previously
presented you can check back to the last post where it was described in
full.

With our decorator factory, when creating a decorator and then decorating a
normal function with it we would use:

```python
@decorator
def my_function_wrapper(wrapped, instance, args, kwargs):
    return wrapped(*args, **kwargs)

@my_function_wrapper
def function():
    pass
```

This is in contrast to the same decorator created in the more traditional
way using a function closure.

```python
def my_function_wrapper(wrapped):
    def _my_function_wrapper(*args, **kwargs):
        return wrapped(*args, **kwargs)
    return _my_function_wrapper

@my_function_wrapper
def function():
    pass
```

Now what actually occurs in these two different cases when we make the call:

```python
function()
```

Tracing the execution of the function
-------------------------------------

In order to trace the execution of our code we can use Python's profile
hooks mechanism.

```python
import sys
def tracer(frame, event, arg):
    print(frame.f_code.co_name, event)

sys.setprofile(tracer)

function()
```

The purpose of the profile hook is to allow you to register a callback
function which is called on the entry and exit of all functions. Using this
was can trace the sequence of function calls that are being made.

For the case of a decorator implemented as a function closure this yields:

```
_my_function_wrapper call
    function call
    function return
_my_function_wrapper return
```

So what we see here is that the nested function of our function closure is
called. This is because the decorator in the case of a using a function
closure is replacing `function` with a reference to that nested function.
When that nested function is called, it then in turn calls the original
wrapped function.

For our implementation using our decorator factory, when we do the same
thing we instead get:

```
__call__ call
    my_function_wrapper call
        function call
        function return
    my_function_wrapper return
__call__ return
```

The difference here is that our decorator replaces `function` with an
instance of our function wrapper class. Being a class, when it is called as
if it was a function, the `__call__()` method is invoked on the instance
of the class. The `__call__()` method is then invoking the user supplied
wrapper function, which in turn calls the original wrapped function.

The result therefore is that we have introduced an extra level of
indirection, or in other words an extra function call into the execution
path.

Keep in mind though that `__call__()` is actually a method though and not
just a normal function. Being a method that means there is actually a lot
more work going on behind the scenes than a normal function call. In
particular, the unbound method needs to be bound to the instance of our
function wrapper class before it can be called. This doesn't appear in the
trace of the calls, but it is occurring and that will incur additional
overhead.

Timing the execution of the function
------------------------------------

By performing the trace above we know that our solution incurs an
additional method call overhead. How much actual extra overhead is this
resulting in though?

To try and measure the increase in overhead in each solution we can use the
`timeit` module to time the execution of our function call. As a baseline,
we first want to time the call of a function without any decorator applied.

```python
# benchmarks.py
def function():
    pass
```

To time this we use the command:

```sh
$ python -m timeit -s 'import benchmarks' 'benchmarks.function()'
```

The `timeit` module when used in this way will perform a suitable large
number of iterations of calling the function, divide the resulting total
time for all calls with the count of the number and end up with a time
value for a single call.

For a 2012 model MacBook Pro this yields:

```
10000000 loops, best of 3: 0.132 usec per loop
```

Next up is to try with a decorator implemented as a function closure. For
this we get:

```
1000000 loops, best of 3: 0.326 usec per loop
```

And finally with our decorator factory:

```
1000000 loops, best of 3: 0.771 usec per loop
```

In this final case, rather than use the exact code as has been presented so
far in this series of blog posts, I have used the `wrapt` module
implementation of what has been described. This implementation works
slightly differently as it has a few extra capabilities over what has been
described and the design is also a little bit different. The overhead will
still be roughly equivalent and if anything will cast things as being
slightly worse than the more minimal implementation.

Speeding up execution of the wrapper
------------------------------------

At this point no doubt there will be people wanting to point out that this
so called better way of implementing a decorator is too slow to be
practical to use, even if it is more correct as far as properly honouring
things such as the descriptor protocol for method invocation.

Is there therefore anything that can be done to speed up the implementation?

That is of course a stupid question for me to be asking because you should
realise by now that I would find a way. :-)

The path that can be taken at this point is to implement everything that
has been described for the function wrapper and object proxy as a Python C
extension module. For simplicity we can keep the decorator factory itself
implemented as pure Python code as execution of that is not time critical
as it would only be invoked once when the decorator is applied to the
function and not on every call of the decorated function.

One thing I am definitely not going to do is blog about how to go about
implementing the function wrapper and object proxy as a Python C extension
module. Rest assured though that it works in the same way as the parallel
pure Python implementation. It does obviously though run a lot quicker due
to being implemented as C code using the Python C APIs rather than as pure
Python code.

What is the result then by implementing the function wrapper and object
proxy as a Python C extension module? It is:

```
1000000 loops, best of 3: 0.382 usec per loop
```

So although a lot more effort was required in actually implementing the
function wrapper and object proxy as a Python C extension module, the
effort was well worth it, with the results now being very close to the
implementation of the decorator that used a function closure.

Normal functions vs methods of classes
--------------------------------------

So far we have only considered the case of decorating a normal function. As
expected, due to the introduction of an extra level of indirection as well
as the function wrapper being implemented as a class, overhead was notably
more. Albeit, that it was still in the order of only half a microsecond.

All the same, we were able to speed things up to a point, by implementing
our function wrapper and object proxy as C code, where the overhead above
that of a decorator implemented as a function closure was negligible.

What now about where we decorate methods of a class. That is, instance
methods, class methods and static methods. For that you will need to wait
until the next blog post in this series on decorators.