Boost.Python Pickle Support
Pickle is a Python module for object serialization, also known
as persistence, marshalling, or flattening.
It is often necessary to save and restore the contents of an object to
a file. One approach to this problem is to write a pair of functions
that read and write data from a file in a special format. A powerful
alternative approach is to use Python's pickle module. Exploiting
Python's ability for introspection, the pickle module recursively
converts nearly arbitrary Python objects into a stream of bytes that
can be written to a file.
The Boost Python Library supports the pickle module
through the interface as described in detail in the
Python Library Reference for pickle. This interface
involves the special methods __getinitargs__,
__getstate__ and __setstate__ as described
in the following. Note that Boost.Python is also fully compatible
with Python's cPickle module.
The Boost.Python Pickle Interface
At the user level, the Boost.Python pickle interface involves three special
methods:
-
__getinitargs__
-
When an instance of a Boost.Python extension class is pickled, the
pickler tests if the instance has a __getinitargs__ method.
This method must return a Python tuple (it is most convenient to use
a boost::python::tuple). When the instance is restored by the
unpickler, the contents of this tuple are used as the arguments for
the class constructor.
If __getinitargs__ is not defined, pickle.load
will call the constructor (__init__) without arguments;
i.e., the object must be default-constructible.
-
__getstate__
-
When an instance of a Boost.Python extension class is pickled, the
pickler tests if the instance has a __getstate__ method.
This method should return a Python object representing the state of
the instance.
-
__setstate__
-
When an instance of a Boost.Python extension class is restored by the
unpickler (pickle.load), it is first constructed using the
result of __getinitargs__ as arguments (see above). Subsequently
the unpickler tests if the new instance has a __setstate__
method. If so, this method is called with the result of
__getstate__ (a Python object) as the argument.
The three special methods described above may be
.def()'ed
individually by the user. However, Boost.Python provides an easy to use
high-level interface via the
boost::python::pickle_suite class that also
enforces consistency:
__getstate__ and
__setstate__
must be defined as pairs. Use of this interface is demonstrated by the
following examples.
Examples
There are three files in
boost/libs/python/test that show how to
provide pickle support.
The C++ class in this example can be fully restored by passing the
appropriate argument to the constructor. Therefore it is sufficient
to define the pickle interface method
__getinitargs__.
This is done in the following way:
- 1. Definition of the C++ pickle function:
struct world_pickle_suite : boost::python::pickle_suite
{
static
boost::python::tuple
getinitargs(world const& w)
{
return boost::python::make_tuple(w.get_country());
}
};
- 2. Establishing the Python binding:
class_<world>("world", args<const std::string&>())
// ...
.def_pickle(world_pickle_suite())
// ...
The C++ class in this example contains member data that cannot be
restored by any of the constructors. Therefore it is necessary to
provide the
__getstate__/
__setstate__ pair of
pickle interface methods:
- 1. Definition of the C++ pickle functions:
struct world_pickle_suite : boost::python::pickle_suite
{
static
boost::python::tuple
getinitargs(const world& w)
{
// ...
}
static
boost::python::tuple
getstate(const world& w)
{
// ...
}
static
void
setstate(world& w, boost::python::tuple state)
{
// ...
}
};
- 2. Establishing the Python bindings for the entire suite:
class_<world>("world", args<const std::string&>())
// ...
.def_pickle(world_pickle_suite())
// ...
For simplicity, the __dict__ is not included in the result
of __getstate__. This is not generally recommended, but a
valid approach if it is anticipated that the object's
__dict__ will always be empty. Note that the safety guard
described below will catch the cases where this assumption is violated.
This example is similar to
pickle2.cpp. However, the
object's
__dict__ is included in the result of
__getstate__. This requires a little more code but is
unavoidable if the object's
__dict__ is not always empty.
Pitfall and Safety Guard
The pickle protocol described above has an important pitfall that the
end user of a Boost.Python extension module might not be aware of:
__getstate__ is defined and the instance's __dict__
is not empty.
The author of a Boost.Python extension class might provide a
__getstate__ method without considering the possibilities
that:
-
his class is used in Python as a base class. Most likely the
__dict__ of instances of the derived class needs to be
pickled in order to restore the instances correctly.
-
the user adds items to the instance's __dict__ directly.
Again, the __dict__ of the instance then needs to be
pickled.
To alert the user to this highly unobvious problem, a safety guard is
provided. If __getstate__ is defined and the instance's
__dict__ is not empty, Boost.Python tests if the class has
an attribute __getstate_manages_dict__. An exception is
raised if this attribute is not defined:
RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)
To resolve this problem, it should first be established that the
__getstate__ and
__setstate__ methods manage the
instances's
__dict__ correctly. Note that this can be done
either at the C++ or the Python level. Finally, the safety guard
should intentionally be overridden. E.g. in C++ (from
pickle3.cpp):
struct world_pickle_suite : boost::python::pickle_suite
{
// ...
static bool getstate_manages_dict() { return true; }
};
Alternatively in Python:
import your_bpl_module
class your_class(your_bpl_module.your_class):
__getstate_manages_dict__ = 1
def __getstate__(self):
# your code here
def __setstate__(self, state):
# your code here
Practical Advice
-
In Boost.Python extension modules with many extension classes,
providing complete pickle support for all classes would be a
significant overhead. In general complete pickle support should
only be implemented for extension classes that will eventually
be pickled.
-
Avoid using __getstate__ if the instance can also be
reconstructed by way of __getinitargs__. This automatically
avoids the pitfall described above.
-
If __getstate__ is required, include the instance's
__dict__ in the Python object that is returned.
© Copyright Ralf W. Grosse-Kunstleve 20012-2002. Permission to copy,
use, modify, sell and distribute this document is granted provided this
copyright notice appears in all copies. This document is provided "as
is" without express or implied warranty, and with no claim as to its
suitability for any purpose.
Updated: Aug 2002.