Serialization

Special Considerations

Object Tracking
Exporting Class Serialization
Class Information
Archive Portability Numerics Traits
Binary Archives
XML Archives
Archive Exceptions
Exception Safety

Object Tracking

Depending on how the class is used and other factors, serialized objects may be tracked by memory address. This prevents the same object from being written to or read from an archive multiple times. These stored addresses can also be used to delete objects created during a loading process that has been interrupted by throwing of an exception.

This could cause problems in progams where the copies of different objects are saved from the same address.


template<class Archive>
void save(boost::basic_oarchive  & ar, const unsigned int version) const
{
    for(int i = 0; i < 10; ++i){
        A x = a[i];
        ar << x;
    }
}

In this case, the data to be saved exists on the stack. Each iteration of the loop updates the value on the stack. So although the data changes each iteration, the address of the data doesn't. If a[i] is an array of objects being tracked by memory address, the library will skip storing objects after the first as it will be assumed that objects at the same address are really the same object.

To help detect such cases, output archive operators expect to be passed const reference arguments.

Given this, the above code will invoke a compile time assertion. The obvious fix in this example is to use


template<class Archive>
void save(boost::basic_oarchive & ar, const unsigned int version) const
{
    for(int i = 0; i < 10; ++i){
        ar << a[i];
    }
}

which will compile and run without problem. The usage of const by the output archive operators will ensure that the process of serialization doesn't change the state of the objects being serialized. An attempt to do this would constitute augmentation of the concept of saving of state with some sort of non-obvious side effect. This would almost surely be a mistake and a likely source of very subtle bugs.

Unfortunately, implementation issues currently prevent the detection of this kind of error when the data item is wrapped as a name-value pair.

A similar problem can occur when different objects are loaded to and address which is different from the final location:


template<class Archive>
void load(boost::basic_oarchive  & ar, const unsigned int version) const
{
    for(int i = 0; i < 10; ++i){
        A x;
        ar >> x;
        std::m_set.insert(x);
    }
}

In this case, the address of x is the one that is tracked rather than the address of the new item added to the set. Left unaddressed this will break the features that depend on tracking such as loading object through a pointer. Subtle bugs will be introduced into the program. This can be addressed by altering the above code thusly:


template<class Archive>
void load(boost::basic_iarchive  & ar, const unsigned int version) const
{
    for(int i = 0; i < 10; ++i){
        A x;
        ar >> x;
        std::pair<std::set::const_iterator, bool> result;
        result = std::m_set.insert(x);
        ar.reset_object_address(& (*result.first), &x);
    }
}

This will adjust the tracking information to reflect the final resting place of the moved variable and thereby rectify the above problem.

If it is known a priori that no pointer values are duplicated, overhead associated with object tracking can be eliminated by setting the object tracking class serialization trait appropriately.

By default, data types designated primitive by Implementation Level class serialization trait are never tracked. If it is desired to track a shared primitive object through a pointer (e.g. a long used as a reference count), It should be wrapped in a class/struct so that it is an identifiable type. The alternative of changing the implementation level of a long would affect all longs serialized in the whole program - probably not what one would intend.

It is possible that we may want to track addresses even though the object is never serialized through a pointer. For example, a virtual base class need be saved/loaded only once. By setting this serialization trait to track_always, we can suppress redundant save/load operations.


BOOST_CLASS_TRACKING(my_virtual_base_class, boost::serialization::track_always)

Exporting Class Serialization

Elsewhere in this manual, we have described BOOST_CLASS_EXPORT. This is used to make the serialization library aware that code should be instantiated for serialization of a given class even though the class hasn't been otherwise referred to by the program.

There are several ways BOOST_CLASS_EXPORT could have been implemented.

One approach would be to instantiate serialization code for all archive classes included in the library. This would add to each executable a large amount of code that is most likely never called. Also it would needlessly slow down compilations of any program that uses the library. Finally, the list of archives would be "built-in" to the library which would compilicate the addition of new or custom archive classes.

Another approach would be for the library user to somehow explicitly instantiate which archive classes code should be instantiated for each class to be serialized. Users would have to include header files corresponding the archive classes to be instantiated. The list of instantiated archive classes would have to be manually kept in sync with the archive class headers actually included. This was considered burdensome and error prone.

This implementation of BOOST_CLASS_EXPORT works in the following way:

All header modules of the form <boost/archive/*archive.hpp> are required to precede the header module export.hpp.
The header export.hpp builds a list of archive classes whose header modules have been previously included. It does this by checking to see which inclusion guard constants have been defined. The header known_archive_types.hpp lists the archive header files which whose include guards will be checked. If you create your own archive class, you probably want to edit this file.
BOOST_CLASS_EXPORT(my_class) explicitly instantiates serialization code for my_class for each archive in the list.

Serialization code will be instantiated for a given archive class if and only if the module that defines that archive class has been included in the program. Given this, our program will contain all necessary code instantiations and no other.

For many styles of code organization this header sequencing requirement presents little problem. Serialization code organized by class headers that are designed to be independent of archive implementations will look something like the following:

// A.hpp
// Note:to preserve independence from any particular archive implementation,
// no headers from <boost/archive/...> are included.
// Headers can be included in any order.
#include <boost/serialization/...>
#include <boost/serialization/export.hpp>
... // include other headers that A depends upon

class A {
	...
};

BOOST_CLASS_EXPORT(A) // note: the export name of this class

This style:

permits the header to include all aspects of the serialization implementation.
permits the header to be included anywhere else as part of some other class declaration.
reflects the concept of headers as a "library of types" which can be used independently in other programs or other parts of the same program.
reflects a fundamental principle of the serialization library design in that the specification of serialization of any class is independent of any archive implementation.

However, it might not always be possible or convenient to conform to the above style. Something like the following might be required or preferred:

// A.hpp
// headers can be included in any order
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
...
#include <boost/serialization/...>
...
// can't do the following because then A.hpp couldn't be included somewhere else
// #include <boost/serialization/export.hpp>

class A {
	...
};
// can't do the following because export.hpp is not included !!
//BOOST_CLASS_EXPORT(A) // note: the export name of this class

As noted in the comments, this would work. But #include <.../export.hpp> can't be used without conflicting with other modules which use #include <.../*archive.hpp>. In this case we can move the export to an implementation file:

// A.cpp
#include "A.hpp"
...
// export.hpp header should be last;
#include <boost/serialization/export.hpp>
...
BOOST_CLASS_EXPORT(A)
...

Class Information

By default, for each class serialized, class information is written to the archive. This information includes version number, implementation level and tracking behavior. This is necessary so that the archive can be correctly deserialized even if a subsequent version of the program changes some of the current trait values for a class. The space overhead for this data is minimal. There is a little bit of runtime overhead since each class has to be checked to see if it has already had its class information included in the archive. In some cases, even this might be considered too much. This extra overhead can be eliminated by setting the implementation level class trait to: boost::serialization::object_serializable.

Turning off tracking and class information serialization will result in pure template inline code that in principle could be optimised down to a simple stream write/read. Elimination of all serialization overhead in this manner comes at a cost. Once archives are released to users, the class serialization traits cannot be changed without invalidating the old archives. Including the class information in the archive assures us that they will be readable in the future even if the class definition is revised. A light weight structure such as display pixel might be declared in a header like this:


#include <boost/serialization/serialization.hpp>
#include <boost/serialization/level.hpp>
#include <boost/serialization/tracking.hpp>

// a pixel is a light weight struct which is used in great numbers.
struct pixel
{
    unsigned char red, green, blue;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int /* version */){
        ar << red << green << blue;
    }
};

// elminate serialization overhead at the cost of
// never being able to increase the version.
BOOST_CLASS_IMPLEMENTATION(pixel, boost::serialization::object_serializable);

// eliminate object tracking (even if serialized through a pointer)
// at the risk of a programming error creating duplicate objects.
BOOST_CLASS_TRACKING(pixel, boost::serialization::track_never)

Archive Portability

Several archive classes create their data in the form of text or portable a binary format. It should be possible to save such an of such a class on one platform and load it on another. This is subject to a couple of conditions.

Numerics

The architecture of the machine reading the archive must be able hold the data saved. For example, the gcc compiler reserves 4 bytes to store a variable of type wchar_t while other compilers reserve only 2 bytes. So its possible that a value could be written that couldn't be represented by the loading program. This is a fairly obvious situation and easily handled by using the numeric types in <boost/cstdint.hpp>

Traits

Another potential problem is illustrated by the following example:


template<class T>
struct my_wrapper {
    template<class Archive>
    Archive & serialize ...
};

...

class my_class {
    wchar_t a;
    short unsigned b;
    template<<class Archive>
    Archive & serialize(Archive & ar, unsigned int version){
        ar & my_wrapper(a);
        ar & my_wrapper(b);
    }
};

If my_wrapper uses default serialization traits there could be a problem. With the default traits, each time a new type is added to the archive, bookkeeping information is added. So in this example, the archive would include such bookkeeping information for my_wrapper<wchar_t> and for my_wrapper<short_unsigned>. Or would it? What about compilers that treat wchar_t as a synonym for unsigned short? In this case there is only one distinct type - not two. If archives are passed between programs with compilers that differ in their treatment of wchar_t the load operation will fail in a catastrophic way.

One remedy for this is to assign serialization traits to the template my_template such that class information for instantiations of this template is never serialized. This process is described above and has been used for Name-Value Pairs. Wrappers would typically be assigned such traits.

Another way to avoid this problem is to assign serialization traits to all specializations of the template my_wrapper for all primitive types so that class information is never saved. This is what has been done for our implementation of serializations for STL collections.

Binary Archives

Standard stream i/o on some systems will expand linefeed characters to carriage-return/linefeed on output. This creates a problem for binary archives. The easiest way to handle this is to open streams for binary archives in "binary mode" by using the flag ios::binary. If this is not done, the archive generated will be unreadable.

Unfortunately, no way has been found to detect this error before loading the archive. Debug builds will assert when this is detected so that may be helpful in catching this error.

XML Archives

XML archives present a somewhat special case. XML format has a nested structure that maps well to the "recursive class member visitor" pattern used by the serialization system. However, XML differs from other formats in that it requires a name for each data member. Our goal is to add this information to the class serialization specification while still permiting the the serialization code to be used with any archive. This is achived by requiring that all data serialized to an XML archive be serialized as a name-value pair. The first member is the name to be used as the XML tag for the data item while the second is a reference to the data item itself. Any attempt to serialize data not wrapped in a in a name-value pair will be trapped at compile time. The system is implemented in such a way that for other archive classes, just the value portion of the data is serialized. The name portion is discarded during compilation. So by always using name-value pairs, it will be guarenteed that all data can be serialized to all archive classes with maximum efficiency.