Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for a snapshot of the develop branch, built from commit 5533fcd373.
PrevUpHomeNext

Design Rationale

Suppose you need a C++ program to calculate the distance between two points. You might define a struct:

struct mypoint
{
    double x, y;
};

and a function, containing the algorithm:

double distance(mypoint const& a, mypoint const& b)
{
    double dx = a.x - b.x;
    double dy = a.y - b.y;
    return sqrt(dx * dx + dy * dy);
}

Quite simple, and it is usable, but not generic. For a library it has to be designed way further. The design above can only be used for 2D points, for the struct mypoint (and no other struct), in a Cartesian coordinate system. A generic library should be able to calculate the distance:

In this and following sections we will make the design step by step more generic.

Using Templates

The distance function can be changed into a template function. This is trivial and allows calculating the distance between other point types than just mypoint. We add two template parameters, allowing input of two different point types.

template <typename P1, typename P2>
double distance(P1 const& a, P2 const& b)
{
    double dx = a.x - b.x;
    double dy = a.y - b.y;
    return std::sqrt(dx * dx + dy * dy);
}

This template version is slightly better, but not much.

Consider a C++ class where member variables are protected... Such a class does not allow to access x and y members directly. So, this paragraph is short and we just move on.

Using Traits

We need to take a generic approach and allow any point type as input to the distance function. Instead of accessing x and y members, we will add a few levels of indirection, using a traits system. The function then becomes:

template <typename P1, typename P2>
double distance(P1 const& a, P2 const& b)
{
    double dx = get<0>(a) - get<0>(b);
    double dy = get<1>(a) - get<1>(b);
    return std::sqrt(dx * dx + dy * dy);
}

This adapted distance function uses a generic get function, with dimension as a template parameter, to access the coordinates of a point. This get forwards to the traits system, defined as following:

namespace traits
{
    template <typename P, int D>
    struct access {};
}

which is then specialized for our mypoint type, implementing a static method called get:

namespace traits
{
    template <>
    struct access<mypoint, 0>
    {
        static double get(mypoint const& p)
        {
            return p.x;
        }
    };
    // same for 1: p.y
    ...
}

Calling traits::access<mypoint, 0>::get(a) now returns us our x coordinate. Nice, isn't it? It is too verbose for a function like this, used so often in the library. We can shorten the syntax by adding an extra free function:

template <int D, typename P>
inline double get(P const& p)
{
    return traits::access<P, D>::get(p);
}

This enables us to call get<0>(a), for any point having the traits::access specialization, as shown in the distance algorithm at the start of this paragraph. So we wanted to enable classes with methods like x(), and they are supported as long as there is a specialization of the access struct with a static get function returning x() for dimension 0, and similar for 1 and y().

Dimension Agnosticism

Now we can calculate the distance between points in 2D, points of any structure or class. However, we wanted to have 3D as well. So we have to make it dimension agnostic. This complicates our distance function. We can use a for loop to walk through dimensions, but for loops have another performance than the straightforward coordinate addition which was there originally. However, we can make more usage of templates and make the distance algorithm as following, more complex but attractive for template fans:

template <typename P1, typename P2, int D>
struct pythagoras
{
    static double apply(P1 const& a, P2 const& b)
    {
        double d = get<D-1>(a) - get<D-1>(b);
        return d * d + pythagoras<P1, P2, D-1>::apply(a, b);
    }
};

template <typename P1, typename P2 >
struct pythagoras<P1, P2, 0>
{
    static double apply(P1 const&, P2 const&)
    {
        return 0;
    }
};

The distance function is calling that pythagoras structure, specifying the number of dimensions:

template <typename P1, typename P2>
double distance(P1 const& a, P2 const& b)
{
    BOOST_STATIC_ASSERT(( dimension<P1>::value == dimension<P2>::value ));

    return sqrt(pythagoras<P1, P2, dimension<P1>::value>::apply(a, b));
}

The dimension which is referred to is defined using another traits class:

namespace traits
{
    template <typename P>
    struct dimension {};
}

which has to be specialized again for the struct mypoint.

Because it only has to publish a value, we conveniently derive it from the Boost.MPL class boost::mpl::int_:

namespace traits
{
    template <>
    struct dimension<mypoint> : boost::mpl::int_<2>
    {};
}

Like the free get function, the library also contains a dimension meta-function.

template <typename P>
struct dimension : traits::dimension<P>
{};

Below is explained why the extra declaration is useful. Now we have agnosticism in the number of dimensions. Our more generic distance function now accepts points of three or more dimensions. The compile-time assertion will prevent point a having two dimension and point b having three dimensions.

Coordinate Type

We assumed double above. What if our points are in integer?

We can easily add a traits class, and we will do that. However, the distance between two integer coordinates can still be a fractionized value. Besides that, a design goal was to avoid square roots. We handle these cases below, in another paragraph. For the moment we keep returning double, but we allow integer coordinates for our point types. To define the coordinate type, we add another traits class, coordinate_type, which should be specialized by the library user:

namespace traits
{
    template <typename P>
    struct coordinate_type{};

    // specialization for our mypoint
    template <>
    struct coordinate_type<mypoint>
    {
        typedef double type;
    };
}

Like the access function, where we had a free get function, we add a proxy here as well. A longer version is presented later on, the short function would look like this:

template <typename P>
struct coordinate_type :    traits::coordinate_type<P> {};

We now can modify our distance algorithm again. Because it still returns double, we only modify the pythagoras computation class. It should return the coordinate type of its input. But, it has two input, possibly different, point types. They might also differ in their coordinate types. Not that that is very likely, but we’re designing a generic library and we should handle those strange cases. We have to choose one of the coordinate types and of course we select the one with the highest precision. This is not worked out here, it would be too long, and it is not related to geometry. We just assume that there is a meta-function select_most_precise selecting the best type.

So our computation class becomes:

template <typename P1, typename P2, int D>
struct pythagoras
{
    typedef typename select_most_precise
        <
            typename coordinate_type<P1>::type,
            typename coordinate_type<P2>::type
        >::type computation_type;

    static computation_type apply(P1 const& a, P2 const& b)
    {
        computation_type d = get<D-1>(a) - get<D-1>(b);
        return d * d + pythagoras <P1, P2, D-1> ::apply(a, b);
    }
};

Different Geometries

We have designed a dimension agnostic system supporting any point type of any coordinate type. There are still some tweaks but they will be worked out later. Now we will see how we calculate the distance between a point and a polygon, or between a point and a line-segment. These formulae are more complex, and the influence on design is even larger. We don’t want to add a function with another name:

template <typename P, typename S>
double distance_point_segment(P const& p, S const& s)

We want to be generic, the distance function has to be called from code not knowing the type of geometry it handles, so it has to be named distance. We also cannot create an overload because that would be ambiguous, having the same template signature. There are two solutions:

We select tag dispatching because it fits into the traits system. The earlier versions (previews) of Boost.Geometry used SFINAE but we found it had several drawbacks for such a big design, so the switch to tag dispatching was made.

With tag dispatching the distance algorithm inspects the type of geometry of the input parameters. The distance function will be changed into this:

template <typename G1, typename G2>
double distance(G1 const& g1, G2 const& g2)
{
    return dispatch::distance
        <
            typename tag<G1>::type,
            typename tag<G2>::type,
            G1, G2
        >::apply(g1, g2);
}

It is referring to the tag meta-function and forwarding the call to the apply method of a dispatch::distance structure. The tag meta-function is another traits class, and should be specialized for per point type, both shown here:

namespace traits
{
    template <typename G>
    struct tag {};

    // specialization
    template <>
    struct tag<mypoint>
    {
        typedef point_tag type;
    };
}

Free meta-function, like coordinate_system and get:

template <typename G>
struct tag : traits::tag<G> {};

Tags (point_tag, segment_tag, etc) are empty structures with the purpose to specialize a dispatch struct. The dispatch struct for distance, and its specializations, are all defined in a separate namespace and look like the following:

namespace dispatch {
    template < typename Tag1, typename Tag2, typename G1, typename G2 >
    struct distance
    {};

    template <typename P1, typename P2>
    struct distance < point_tag, point_tag, P1, P2 >
    {
        static double apply(P1 const& a, P2 const& b)
        {
            // here we call pythagoras
            // exactly like we did before
            ...
        }
    };

    template <typename P, typename S>
    struct distance
    <
        point_tag, segment_tag, P, S
    >
    {
        static double apply(P const& p, S const& s)
        {
            // here we refer to another function
            // implementing point-segment
            // calculations in 2 or 3
            // dimensions...
            ...
        }
    };

    // here we might have many more
    // specializations,
    // for point-polygon, box-circle, etc.

} // namespace

So yes, it is possible; the distance algorithm is generic now in the sense that it also supports different geometry types. One drawback: we have to define two dispatch specializations for point - segment and for segment - point separately. That will also be solved, in the paragraph reversibility below. The example below shows where we are now: different point types, geometry types, dimensions.

point a(1,1);
point b(2,2);
std::cout << distance(a,b) << std::endl;
segment s1(0,0,5,3);
std::cout << distance(a, s1) << std::endl;
rgb red(255, 0, 0);
rbc orange(255, 128, 0);
std::cout << "color distance: " << distance(red, orange) << std::endl;

Kernel Revisited

We described above that we had a traits class coordinate_type, defined in namespace traits, and defined a separate coordinate_type class as well. This was actually not really necessary before, because the only difference was the namespace clause. But now that we have another geometry type, a segment in this case, it is essential. We can call the coordinate_type meta-function for any geometry type, point, segment, polygon, etc, implemented again by tag dispatching:

template <typename G>
struct coordinate_type
{
    typedef typename dispatch::coordinate_type
        <
            typename tag<G>::type, G
        >::type type;
};

Inside the dispatch namespace this meta-function is implemented twice: a generic version and one specialization for points. The specialization for points calls the traits class. The generic version calls the point specialization, as a sort of recursive meta-function definition:

namespace dispatch
{

    // Version for any geometry:
    template <typename GeometryTag, typename G>
    struct coordinate_type
    {
        typedef typename point_type
            <
                GeometryTag, G
            >::type point_type;

        // Call specialization on point-tag
        typedef typename coordinate_type < point_tag, point_type >::type type;
    };

    // Specialization for point-type:
    template <typename P>
    struct coordinate_type<point_tag, P>
    {
        typedef typename
            traits::coordinate_type<P>::type
            type;
    };
}

So it calls another meta-function point_type. This is not elaborated in here but realize that it is available for all geometry types, and typedefs the point type which makes up the geometry, calling it type.

The same applies for the meta-function dimension and for the upcoming meta-function coordinate system.

Coordinate System

Until here we assumed a Cartesian system. But we know that the Earth is not flat. Calculating a distance between two GPS-points with the system above would result in nonsense. So we again extend our design. We define for each point type a coordinate system type using the traits system again. Then we make the calculation dependant on that coordinate system.

Coordinate system is similar to coordinate type, a meta-function, calling a dispatch function to have it for any geometry-type, forwarding to its point specialization, and finally calling a traits class, defining a typedef type with a coordinate system. We don’t show that all here again. We only show the definition of a few coordinate systems:

struct cartesian {};

template<typename DegreeOrRadian>
struct geographic
{
    typedef DegreeOrRadian units;
};

So Cartesian is simple, for geographic we can also select if its coordinates are stored in degrees or in radians.

The distance function will now change: it will select the computation method for the corresponding coordinate system and then call the dispatch struct for distance. We call the computation method specialized for coordinate systems a strategy. So the new version of the distance function is:

template <typename G1, typename G2>
double distance(G1 const& g1, G2 const& g2)
{
    typedef typename strategy_distance
        <
            typename coordinate_system<G1>::type,
            typename coordinate_system<G2>::type,
            typename point_type<G1>::type,
            typename point_type<G2>::type,
            dimension<G1>::value
        >::type strategy;

    return dispatch::distance
        <
            typename tag<G1>::type,
            typename tag<G2>::type,
            G1, G2, strategy
        >::apply(g1, g2, strategy());
}

The strategy_distance mentioned here is a struct with specializations for different coordinate systems.

template <typename T1, typename T2, typename P1, typename P2, int D>
struct strategy_distance
{
    typedef void type;
};

template <typename P1, typename P2, int D>
struct strategy_distance<cartesian, cartesian, P1, P2, D>
{
    typedef pythagoras<P1, P2, D> type;
};

So, here is our pythagoras again, now defined as a strategy. The distance dispatch function just calls its apply method.

So this is an important step: for spherical or geographical coordinate systems, another strategy (computation method) can be implemented. For spherical coordinate systems have the haversine formula. So the dispatching traits struct is specialized like this

template <typename P1, typename P2, int D = 2>
struct strategy_distance<spherical, spherical, P1, P2, D>
{
    typedef haversine<P1, P2> type;
};

// struct haversine with apply function
// is omitted here

For geography, we have some alternatives for distance calculation. There is the Andoyer method, fast and precise, and there is the Vincenty method, slower and more precise, and there are some less precise approaches as well.

Per coordinate system, one strategy is defined as the default strategy. To be able to use another strategy as well, we modify our design again and add an overload for the distance algorithm, taking a strategy object as a third parameter.

This new overload distance function also has the advantage that the strategy can be constructed outside the distance function. Because it was constructed inside above, it could not have construction parameters. But for Andoyer or Vincenty, or the haversine formula, it certainly makes sense to have a constructor taking the radius of the earth as a parameter.

So, the distance overloaded function is:

template <typename G1, typename G2, typename S>
double distance(G1 const& g1, G2 const& g2, S const& strategy)
{
    return dispatch::distance
        <
            typename tag<G1>::type,
            typename tag<G2>::type,
            G1, G2, S
        >::apply(g1, g2, strategy);
}

The strategy has to have a method apply taking two points as arguments (for points). It is not required that it is a static method. A strategy might define a constructor, where a configuration value is passed and stored as a member variable. In those cases a static method would be inconvenient. It can be implemented as a normal method (with the const qualifier).

We do not list all implementations here, Vincenty would cover half a page of mathematics, but you will understand the idea. We can call distance like this:

distance(c1, c2)

where c1 and c2 are Cartesian points, or like this:

distance(g1, g2)

where g1 and g2 are Geographic points, calling the default strategy for Geographic points (e.g. Andoyer), and like this:

distance(g1, g2, vincenty<G1, G2>(6275))

where a strategy is specified explicitly and constructed with a radius.

Point Concept

The five traits classes mentioned in the previous sections form together the Point Concept. Any point type for which specializations are implemented in the traits namespace should be accepted as a valid type. So the Point Concept consists of:

The last one is a class, containing the method get and the (optional) method set, the first four are metafunctions, either defining type or declaring a value (conform MPL conventions).

So we now have agnosticism for the number of dimensions, agnosticism for coordinate systems; the design can handle any coordinate type, and it can handle different geometry types. Furthermore we can specify our own strategies, the code will not compile in case of two points with different dimensions (because of the assertion), and it will not compile for two points with different coordinate systems (because there is no specialization). A library can check if a point type fulfills the requirements imposed by the concepts. This is handled in the upcoming section Concept Checking.

Return Type

We promised that calling std::sqrt was not always necessary. So we define a distance result struct that contains the squared value and is convertible to a double value. This, however, only has to be done for pythagoras. The spherical distance functions do not take the square root so for them it is not necessary to avoid the expensive square root call; they can just return their distance.

So the distance result struct is dependant on strategy, therefore made a member type of the strategy. The result struct looks like this:

template<typename T = double>
struct cartesian_distance
{
    T sq;
    explicit cartesian_distance(T const& v) : sq (v) {}

    inline operator T() const
    {
        return std::sqrt(sq);
    }
};

It also has operators defined to compare itself to other results without taking the square root.

Each strategy should define its return type, within the strategy class, for example:

typedef cartesian_distance<T> return_type;

or:

typedef double return_type;

for cartesian (pythagoras) and spherical, respectively.

Again our distance function will be modified, as expected, to reflect the new return type. For the overload with a strategy it is not complex:

template < typename G1, typename G2, typename Strategy >
typename Strategy::return_type distance( G1 const& G1 , G2 const& G2 , S const& strategy)

But for the one without strategy we have to select strategy, coordinate type, etc. It would be spacious to do it in one line so we add a separate meta-function:

template <typename G1, typename G2 = G1>
struct distance_result
{
    typedef typename point_type<G1>::type P1;
    typedef typename point_type<G2>::type P2;
    typedef typename strategy_distance
        <
            typename cs_tag<P1>::type,
            typename cs_tag<P2>::type,
            P1, P2
        >::type S;

    typedef typename S::return_type type;
};

and modify our distance function:

template <typename G1, typename G2>
inline typename distance_result<G1, G2>::type distance(G1 const& g1, G2 const& g2)
{
    // ...
}

Of course also the apply functions in the dispatch specializations will return a result like this. They have a strategy as a template parameter everywhere, making the less verbose version possible.

Axis Order

This is an important note for users who develop GIS applications, require compliance with OGC standards or use variety of coordinate reference systems (CRS).

Boost.Geometry convention is that coordinate values of 2D tuple is always given according to mathematical axis order: X,Y. Considering GIS applications, that always means easting,northing or, in terms of geographic coordinate system: longitude,latitude. Boost.Geometry does not allow or respect coordinate values listed in the axis order as specified by coordinate reference system (CRS).

In practice, users may easily adapt point types with alternate axis ordering and specify coordinate access in order as expected by Boost.Geometry.

Summary

In this design rationale, Boost.Geometry is step by step designed using tag dispatching, concepts, traits, and metaprogramming. We used the well-known distance function to show the design.

Boost.Geometry is designed like described here, with some more techniques as automatically reversing template arguments, tag casting, and reusing implementation classes or dispatch classes as policies in other dispatch classes.


PrevUpHomeNext