Serialization

Tutorial

A Very Simple Case
Non Intrusive Version
Serializable Members
Derived Classes
Pointers
Arrays
STL Collections
Class Versioning
Splitting serialize into save/load
Archives
List of examples

An output archive is similar to an output data stream. Data can be saved to the archive with either the << or the & operator:


ar << data;
ar & data;

An input archive is similar to an input datastream. Data can be loaded from the archive with either the >> or the & operator.


ar >> data;
ar & data;

When these operators are invoked for primitive data types, the data is simply saved/loaded to/from the archive. When invoked for class data types, the class serialize function is invoked. Each serialize function is uses the above operators to save/load its data members. This process will continue in a recursive manner until all the data contained in the class is saved/loaded.

A Very Simple Case

These operators are used inside the serialize function to save and load class data members.

Included in this library is a program called demo.cpp which illustrates how to use this system. Below we excerpt code from this program to illustrate with the simplest possible case how this library is intended to be used.


#include <fstream>

// include headers that implement a archive in simple text format
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>

/////////////////////////////////////////////////////////////
// gps coordinate
//
// illustrates serialization for a simple type
//
class gps_position
{
private:
    friend class boost::serialization::access;
    // When the class Archive corresponds to an output archive, the
    // & operator is defined similar to <<.  Likewise, when the class Archive
    // is a type of input archive the & operator is defined similar to >>.
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & degrees;
        ar & minutes;
        ar & seconds;
    }
    int degrees;
    int minutes;
    float seconds;
public:
    gps_position(){};
    gps_position(int d, int m, float s) :
        degrees(d), minutes(m), seconds(s)
    {}
};

int main() {
    // create and open a character archive for output
    std::ofstream ofs("filename");

    // create class instance
    const gps_position g(35, 59, 24.567f);

    // save data to archive
    {
        boost::archive::text_oarchive oa(ofs);
        // write class instance to archive
        oa << g;
    	// archive and stream closed when destructors are called
    }

    // ... some time later restore the class instance to its orginal state
    gps_position newg;
    {
        // create and open an archive for input
        std::ifstream ifs("filename");
        boost::archive::text_iarchive ia(ifs);
        // read class state from archive
        ia >> newg;
        // archive and stream closed when destructors are called
    }
    return 0;
}

For each class to be saved via serialization, there must exist a function to save all the class members which define the state of the class. For each class to be loaded via serialization, there must exist a function to load theese class members in the same sequence as they were saved. In the above example, these functions are generated by the template member function serialize.

Non Intrusive Version

The above formulation is intrusive. That is, it requires that classes whose instances are to be serialized be altered. This can be inconvenient in some cases. An equivalent alternative formulation permitted by the system would be:


#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>

class gps_position
{
public:
    int degrees;
    int minutes;
    float seconds;
    gps_position(){};
    gps_position(int d, int m, float s) :
        degrees(d), minutes(m), seconds(s)
    {}
};

namespace boost {
namespace serialization {

template<class Archive>
void serialize(Archive & ar, gps_position & g, const unsigned int version)
{
    ar & g.degrees;
    ar & g.minutes;
    ar & g.seconds;
}

} // namespace serialization
} // namespace boost

In this case the generated serialize functions are not members of the gps_position class. The two formulations function in exactly the same way.

The main application of non-intrusive serialization is to permit serialization to be implemented for classes without changing the class definition. In order for this to be possible, the class must expose enough information to reconstruct the class state. In this example, we presumed that the class had public members - not a common occurence. Only classes which expose enough information to save and restore the class state will be serializable without changing the class definition.

Serializable Members

A serializable class with serializable members would look like this:


class bus_stop
{
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & latitude;
        ar & longitude;
    }
    gps_position latitude;
    gps_position longitude;
protected:
    bus_stop(const gps_position & lat_, const gps_position & long_) :
    latitude(lat_), longitude(long_)
    {}
public:
    bus_stop(){}
    // See item # 14 in Effective C++ by Scott Meyers.
    // re non-virtual destructors in base classes.
    virtual ~bus_stop(){}
};

That is, members of class type are serialized just as members of primitive types are.

Note that saving an instance of the class bus_stop with one of the archive operators will invoke the serialize function which saves latitude and longitude. Each of these in turn will be saved by invoking serialize in the definition of gps_position. In this manner the whole data structure is saved by the application of an archive operator to just its root item.

Derived Classes

Derived classes should include serializations of their base classes.


#include <boost/serialization/base_object.hpp>

class bus_stop_corner : public bus_stop
{
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        // serialize base class information
        ar & boost::serialization::base_object<bus_stop>(*this);
        ar & street1;
        ar & street2;
    }
    std::string street1;
    std::string street2;
    virtual std::string description() const
    {
        return street1 + " and " + street2;
    }
public:
    bus_stop_corner(){}
    bus_stop_corner(const gps_position & lat_, const gps_position & long_,
        const std::string & s1_, const std::string & s2_
    ) :
        bus_stop(lat_, long_), street1(s1_), street2(s2_)
    {}
};

Note the serialization of the base classes from the derived class. Do NOT directly call the base class serialize functions. Doing so might seem to work but will bypass the code that tracks instances written to storage to eliminate redundancies. It will also bypass the writing of class version information into the archive. For this reason, it is advisable to always make member serialize functions private. The declaration friend boost::serialization::access will grant to the serialization library access to private member variables and functions.

Pointers

Suppose we define a bus route as an array of bus stops. Given that

we might have several types of bus stops (remember bus_stop is a base class)
a given bus_stop might appear in more than one route.

it's convenient to represent a bus route with an array of pointers to bus_stop.


class bus_route
{
    friend class boost::serialization::access;
    bus_stop * stops[10];
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        int i;
        for(i = 0; i < 10; ++i)
            ar & stops[i];
    }
public:
    bus_route(){}
};

Each member of the array stops will be serialized. But remember each member is a pointer - so what can this really mean? The whole object of this serialization is to permit reconstruction of the original data structures at another place and time. In order to accomplish this with a pointer, it is not sufficient to save the value of the pointer, rather the object it points to must be saved. When the member is later loaded, a new object has to be created and a new pointer has to be loaded into the class member.

If the same pointer is serialized more than once, only one instance is be added to the archive. When read back, no data is read back in. The only operation that occurs is for the second pointer is set equal to the first

Note that, in this example, the array consists of polymorphic pointers. That is, each array element point to one of several possible kinds of bus stops. So when the pointer is saved, some sort of class identifier must be saved. When the pointer is loaded, the class identifier must be read and and instance of the corresponding class must be constructed. Finally the data can be loaded to newly created instance of the correct type. As can be seen in demo.cpp, serialization of pointers to derived classes through a base clas pointer may require explicit enumeration of the derived classes to be serialized. This is referred to as "registration" or "export" of derived classes. This requirement and the methods of satisfying it are explained in detail here.

All this is accomplished automatically by the serialization library. The above code is all that is necessary to accomplish the saving and loading of objects accessed through pointers.

Arrays

The above formulation is in fact more complex than necessary. The serialization library detects when the object being serialized is an array and emits code equivalent to the above. So the above can be shortened to:


class bus_route
{
    friend class boost::serialization::access;
    bus_stop * stops[10];
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & stops;
    }
public:
    bus_route(){}
};

STL Collections

The above example uses an array of members. More likely such an application would use an STL collection for such a purpose. The serialization library contains code for serialization of all STL classes. Hence, the reformulation below will also work as one would expect.


#include <boost/serialization/list.hpp>

class bus_route
{
    friend class boost::serialization::access;
    std::list<bus_stop *> stops;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & stops;
    }
public:
    bus_route(){}
};

Class Versioning

Suppose we're satisfied with our bus_route class, build a program that uses it and ship the product. Some time later, it's decided that the program needs enhancement and the bus_route class is altered to include the name of the driver of the route. So the new version looks like:


#include <boost/serialization/list.hpp>
#include <boost/serialization/string.hpp>

class bus_route
{
    friend class boost::serialization::access;
    std::list<bus_stop *> stops;
    std::string driver_name;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & driver_name;
        ar & stops;
    }
public:
    bus_route(){}
};

Great, we're all done. Except... what about people using our application who now have a bunch of files created under the previous program. How can these be used with our new program version?

In general, the serialization library stores a version number in the archive for each class serialized. By default this version number is 0. When the archive is loaded, the version number under which it was saved is read. The above code can be altered to exploit this


#include <boost/serialization/list.hpp>
#include <boost/serialization/string.hpp>
#include <boost/serialization/version.hpp>

class bus_route
{
    friend class boost::serialization::access;
    std::list<bus_stop *> stops;
    std::string driver_name;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        // only save/load driver_name for newer archives
        if(version > 0)
            ar & driver_name;
        ar & stops;
    }
public:
    bus_route(){}
};

BOOST_CLASS_VERSION(bus_route, 1)

By application of versioning to each class, there is no need to try to maintain a versioning of files. That is, a file version is the combination of the versions of all its constituent classes. This system permits programs to be always compatible with archives created by all previous versions of a program with no more effort than required by this example.

Splitting `serialize` into `save/load`

The serialize function is simple, concise, and guarantees that class members are saved and loaded in the same sequence - the key to the serialization system. However, there are cases where the load and save operations are not as similar as the examples used here. For example, this could occur with a class that has evolved through multiple versions. The above class can be reformulated as:


#include <boost/serialization/list.hpp>
#include <boost/serialization/string.hpp>
#include <boost/serialization/version.hpp>
#include <boost/serialization/split_member.hpp>

class bus_route
{
    friend class boost::serialization::access;
    std::list<bus_stop *> stops;
    std::string driver_name;
    template<class Archive>
    void save(Archive & ar, const unsigned int version) const
    {
        // note, version is always the latest when saving
        ar  & driver_name;
        ar  & stops;
    }
    template<class Archive>
    void load(Archive & ar, const unsigned int version)
    {
        if(version > 0)
            ar & driver_name;
        ar  & stops;
    }
    BOOST_SERIALIZATION_SPLIT_MEMBER()
public:
    bus_route(){}
};

BOOST_CLASS_VERSION(bus_route, 1)

The macro BOOST_SERIALIZATION_SPLIT_MEMBER() generates code which invokes the save or load depending on whether the archive is used for saving or loading.

List of Examples

demo.cpp

This is the completed example used in this tutorial. It does the following:

Creates a structure of differing kinds of stops, routes and schedules
Displays it
Serializes it to a file named "testfile.txt" with one statement
Restores to another structure
Displays the restored structure

Output of this program is sufficient to verify that all the originally stated requirements for a serialization system are met with this system. The contents of the archive file can also be displayed as serialization files are ASCII text.

demo_xml.cpp

This is a variation the original demo which supports xml archives in addition to the others. The extra wrapping macro, BOOST_SERIALIZATION_NVP(name), is needed to associate a data item name with the corresponding xml tag. It is importanted that 'name' be a valid xml tag, else it will be impossible to restore the archive. For more information see Name-Value Pairs. Here is what an xml archive looks like.

demo_xml_save.cpp and demo_xml_load.cpp

Note also that though our examples save and load the program data to an archive within the same program, this merely a convenience for purposes of illustration. In general, the archive may or may not be loaded by the same program that created it.

The astute reader might notice that these examples contain a subtle but important flaw. They leak memory. The bus stops are created in the main function. The bus schedules may refer to these bus stops any number of times. At the end of the main function after the bus schedules are destroyed, the bus stops are destroyed. This seems fine. But what about the structure new_schedule data item created by the process of loading from an archive? This contains its own separate set of bus stops that are not referenced outside of the bus schedule. These won't be destroyed anywhere in the program - a memory leak.

There are couple of ways of fixing this. One way is to explicitly manage the bus stops. However, a more robust and transparent is to use shared_ptr rather than raw pointers. Along with serialization implemenations for the Standard Library, the serialization library includes implementation of serialization for boost::shared ptr. Given this, it should be easy to alter any of these examples to eliminate the memory leak. This is left as an excercise for the reader.