File: issues-with-pickle-module.rst

package info (click to toggle)
mod-wsgi 5.0.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,824 kB
  • sloc: ansic: 15,512; python: 3,697; makefile: 219; sh: 107
file content (175 lines) | stat: -rw-r--r-- 6,966 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
=========================
Issues With Pickle Module
=========================

This article describes various limitations on what data can be stored using
the "pickle" module from a WSGI application script file. This arises due
to the fact that a WSGI application script file is not treated exactly the
same as a standard Python module.

Note that these limitations only apply to the WSGI application script file
which is the target of the WSGIScriptAlias, AddHandler or Action
directives. Any standard Python modules or packages which make up an
application and which are being imported from directories located in
``sys.path`` using the 'import' statement are not affected.

Packing And Script Reloading
----------------------------

The first source of problems and limitations is how the operation of the
"pickle" serialisation routine is affected by the ability of mod_wsgi to
automatically reload WSGI application script files. The particular types of
data which are known to be affected are function objects and class objects.

To illustrate the problems and where they arise, consider the following
output from an interactive Python session::

    >>> import pickle
    >>> def a(): pass
    ... 
    >>> pickle.dumps(a)
    'c__main__\na\np0\n.'
    >>> z = a
    >>> pickle.dumps(z)
    'c__main__\na\np0\n.'

As can be seen, it is possible to pickle a function object. This can be
done even through a copy of the function object by reference, although in
that case the pickled object still refers to the original function object.

If now the original function object is deleted however, and the copy of the
function object is pickled, a failure will occur::

    >>> del a
    >>> pickle.dumps(z)
    Traceback (most recent call last):
    ... <deleted>
    pickle.PicklingError: Can't pickle <function a at 0x612b0>: it's not found as __main__.a

The exception has been raised because the original function object was
deleted from where it was created. It occurs because the copy of the
original function object is still internally identified by the name which
it was assigned at the point of creation. The "pickle" serialisation
routine will check that the original object as identified by the name still
exists. If it doesn't exist, it will refuse to serialise the object.

Creating a new function object in place of the original function object
does not eliminate the problem, although it does result in a different sort
of exception::

    >>> def a(): pass
    ... 
    >>> pickle.dumps(z)
    Traceback (most recent call last):
    ... <deleted>
    pickle.PicklingError: Can't pickle <function a at 0x612b0>: it's not the same object as __main__.a

In this case, the "pickle" serialisation routine recognises that "a" exists
but realises that it is actually a different function object from which the
"z" copy was originally made.

Where the problems start occuring with mod_wsgi is if the function object
being saved was itself a copy of some function object which is held outside
of the module the function object was defined in. If the module holding the
original function object was actually the WSGI application script file and
it was reloaded because of the automatic script reloading mechanism, an
attempt to pickle the object will fail. This is because the original
function object which had been copied from will have been replaced by a new
one when the script was reloaded.

This sort of problem, although it will not occur for an instance of a
class, will occur for the class object itself::

    >>> class B: pass
    ... 
    >>> b=B()
    >>> pickle.dumps(b)
    '(i__main__\nB\np0\n(dp1\nb.'
    >>> del B
    >>> pickle.dumps(b)
    '(i__main__\nB\np0\n(dp1\nb.'
    >>> class B: pass
    ... 
    >>> pickle.dumps(B)
    'c__main__\nB\np0\n.'
    >>> C = B
    >>> pickle.dumps(C)
    'c__main__\nB\np0\n.'
    >>> del B
    >>> pickle.dumps(C)
    Traceback (most recent call last):
    ... <deleted>
    pickle.PicklingError: Can't pickle <class __main__.B at 0x53ab0>: it's not found as __main__.B

Note though that for the case of a class instance, an appropriate class
object must exist at the same location when the serialised object is being
restored::

    >>> class B: pass
    ... 
    >>> b = B()
    >>> pickle.loads(pickle.dumps(b))
    <__main__.B instance at 0x41e40>
    >>> del B
    >>> pickle.loads(pickle.dumps(b))
    Traceback (most recent call last):
    ... <delete>
    AttributeError: 'module' object has no attribute 'B'

Unpacking And Module Names
--------------------------

The second problem derives from how the mod_wsgi script loading mechanism
does not make use of the standard Python module importing mechanism. This
is necessary as the standard Python module importing mechanism requires
every loaded module to have a unique name, with each module residing in
``sys.modules`` under that name. Further, that name must be able to be
used to import the module.

The mod_wsgi script loading mechanism does not place modules in
``sys.modules`` under their original name so as to allow multiple modules
with the same name in different directories and also to avoid having to use
the ".py" extension for script files.

The consequence though of modules not residing in ``sys.modules`` under
their original name is that function objects and class objects within such
a module may not be able to converted back into objects from their
serialised form. This is because "pickle" when attempting to import a
module automatically if the module isn't already loaded will not be
able to load the WSGI application script file.

The problem can be seen in the following output from an interactive Python
session::

    >>> exec "class C: pass" in m.__dict__
    >>> c = m.C()
    >>> pickle.dumps(c)
    '(im\nC\np0\n(dp1\nb.'
    >>> pickle.loads(pickle.dumps(c))
    <m.C instance at 0x9a0d0>
    >>> del sys.modules["m"]
    >>> pickle.loads(pickle.dumps(c))
    Traceback (most recent call last):
    ... <deleted>
    ImportError: No module named m

Summary Of Limitations
----------------------

Although the first problem described above could be avoided by disabling
script reloading, there is no way to work around the second problem
resulting from how mod_wsgi names modules when stored in ``sys.modules``.

In practice, what this means is that neither function objects, class
objects or instances of classes which are defined in a WSGI application
script file should be stored using the "pickle" module.

In order to ensure that no strange problems at all are likely to occur, it
is suggested that only basic builtin Python types, ie., scalars, tuples,
lists and dictionaries, be stored using the "pickle" module from a WSGI
application script file. That is, avoid any type of object which has user
defined code associated with it.

Note that this limitation only applies to the WSGI application script file,
it doesn't apply to normal Python modules imported using the Python "import"
statement.