1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
|
[section Pickle support]
[section Introduction]
Pickle is a Python module for object serialization, also known as persistence, marshalling, or flattening.
It is often necessary to save and restore the contents of an object to a file. One approach to this problem is to write a pair of functions that read and write data from a file in a special format. A powerful alternative approach is to use Python's pickle module. Exploiting Python's ability for introspection, the pickle module recursively converts nearly arbitrary Python objects into a stream of bytes that can be written to a file.
The Boost Python Library supports the pickle module through the interface as described in detail in the [@https://docs.python.org/2/library/pickle.html Python Library Reference for pickle]. This interface involves the special methods `__getinitargs__`, `__getstate__` and `__setstate__` as described in the following. Note that `Boost.Python` is also fully compatible with Python's cPickle module.
[endsect]
[section The Pickle Interface]
At the user level, the Boost.Python pickle interface involves three special methods:
[variablelist
[[__getinitargs__][When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a `__getinitargs__` method. This method must return a Python `tuple` (it is most convenient to use a [link object_wrappers.boost_python_tuple_hpp.class_tuple `boost::python::tuple`]). When the instance is restored by the unpickler, the contents of this tuple are used as the arguments for the class constructor.
If `__getinitargs__` is not defined, `pickle.load` will call the constructor (`__init__`) without arguments; i.e., the object must be default-constructible.]]
[[__getstate__][When an instance of a `Boost.Python` extension class is pickled, the pickler tests if the instance has a `__getstate__` method. This method should return a Python object representing the state of the instance.]]
[[__setstate__][When an instance of a `Boost.Python` extension class is restored by the unpickler (`pickle.load`), it is first constructed using the result of `__getinitargs__` as arguments (see above). Subsequently the unpickler tests if the new instance has a `__setstate__` method. If so, this method is called with the result of `__getstate__` (a Python object) as the argument.]]
]
The three special methods described above may be `.def()`\ 'ed individually by the user. However, `Boost.Python` provides an easy to use high-level interface via the `boost::python::pickle_suite` class that also enforces consistency: `__getstate__` and `__setstate__` must be defined as pairs. Use of this interface is demonstrated by the following examples.
[endsect]
[section Example]
There are three files in `python/test` that show how to provide pickle support.
[section pickle1.cpp]
The C++ class in this example can be fully restored by passing the appropriate argument to the constructor. Therefore it is sufficient to define the pickle interface method `__getinitargs__`. This is done in the following way:
Definition of the C++ pickle function:
``
struct world_pickle_suite : boost::python::pickle_suite
{
static
boost::python::tuple
getinitargs(world const& w)
{
return boost::python::make_tuple(w.get_country());
}
};
``
Establishing the Python binding:
``
class_<world>("world", args<const std::string&>())
// ...
.def_pickle(world_pickle_suite())
// ...
``
[endsect]
[section pickle2.cpp]
The C++ class in this example contains member data that cannot be restored by any of the constructors. Therefore it is necessary to provide the `__getstate__`/`__setstate__` pair of pickle interface methods:
Definition of the C++ pickle functions:
``
struct world_pickle_suite : boost::python::pickle_suite
{
static
boost::python::tuple
getinitargs(const world& w)
{
// ...
}
static
boost::python::tuple
getstate(const world& w)
{
// ...
}
static
void
setstate(world& w, boost::python::tuple state)
{
// ...
}
};
``
Establishing the Python bindings for the entire suite:
``
class_<world>("world", args<const std::string&>())
// ...
.def_pickle(world_pickle_suite())
// ...
``
For simplicity, the `__dict__` is not included in the result of `__getstate__`. This is not generally recommended, but a valid approach if it is anticipated that the object's `__dict__` will always be empty. Note that the safety guard described below will catch the cases where this assumption is violated.
[endsect]
[section pickle3.cpp]
This example is similar to pickle2.cpp. However, the object's `__dict__` is included in the result of `__getstate__`. This requires a little more code but is unavoidable if the object's `__dict__` is not always empty.
[endsect]
[endsect]
[section Pitfall and Safety Guard]
The pickle protocol described above has an important pitfall that the end user of a Boost.Python extension module might not be aware of:
[*`__getstate__` is defined and the instance's `__dict__` is not empty.]
The author of a `Boost.Python` extension class might provide a `__getstate__` method without considering the possibilities that:
* his class is used in Python as a base class. Most likely the `__dict__` of instances of the derived class needs to be pickled in order to restore the instances correctly.
* the user adds items to the instance's `__dict__` directly. Again, the `__dict__` of the instance then needs to be pickled.
To alert the user to this highly unobvious problem, a safety guard is provided. If `__getstate__` is defined and the instance's `__dict__` is not empty, `Boost.Python` tests if the class has an attribute `__getstate_manages_dict__`. An exception is raised if this attribute is not defined:
``
RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)
``
To resolve this problem, it should first be established that the `__getstate__` and `__setstate__` methods manage the instances's `__dict__` correctly. Note that this can be done either at the C++ or the Python level. Finally, the safety guard should intentionally be overridden. E.g. in C++ (from pickle3.cpp):
``
struct world_pickle_suite : boost::python::pickle_suite
{
// ...
static bool getstate_manages_dict() { return true; }
};
``
Alternatively in Python:
``
import your_bpl_module
class your_class(your_bpl_module.your_class):
__getstate_manages_dict__ = 1
def __getstate__(self):
# your code here
def __setstate__(self, state):
# your code here
``
[endsect]
[section Practical Advice]
* In `Boost.Python` extension modules with many extension classes, providing complete pickle support for all classes would be a significant overhead. In general complete pickle support should only be implemented for extension classes that will eventually be pickled.
* Avoid using `__getstate__` if the instance can also be reconstructed by way of `__getinitargs__`. This automatically avoids the pitfall described above.
* If `__getstate__` is required, include the instance's `__dict__` in the Python object that is returned.
[endsect]
[section Light-weight alternative: pickle support implemented in Python]
The pickle4.cpp example demonstrates an alternative technique for implementing pickle support. First we direct Boost.Python via the class_::enable_pickling() member function to define only the basic attributes required for pickling:
``
class_<world>("world", args<const std::string&>())
// ...
.enable_pickling()
// ...
``
This enables the standard Python pickle interface as described in the Python documentation. By "injecting" a `__getinitargs__` method into the definition of the wrapped class we make all instances pickleable:
``
# import the wrapped world class
from pickle4_ext import world
# definition of __getinitargs__
def world_getinitargs(self):
return (self.get_country(),)
# now inject __getinitargs__ (Python is a dynamic language!)
world.__getinitargs__ = world_getinitargs
``
See also the tutorial section on injecting additional methods from Python.
[endsect]
[endsect]
|