...one of the most highly
regarded and expertly designed C++ library projects in the
world.

— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards

Conditional type selection is the simplest basic construct of C++ template metaprogramming. Veldhuizen [Vel95a] was the first to show how to implement it, and Czarnecki and Eisenecker [CE00] first presented it as a standalone library primitive. The MPL defines the corresponding facility as follows:

template< typename Condition , typename T1 , typename T2 > struct if_ { typedef /*unspecified*/ type; };

Note that the first template parameter of the template is a type.

// usage/semantics typedef mpl::if_<mpl::true_c,char,long>::type t1; typedef mpl::if_<mpl::false_c,char,long>::type t2; BOOST_MPL_ASSERT_IS_SAME(t1, char); BOOST_MPL_ASSERT_IS_SAME(t2, long);

The construct is important because template metaprograms often contain a lot of decision-making code, and, as we will show, spelling it manually every time via (partial) class template specialization quickly becomes impractical. The template is also important from the point of encapsulating the compiler workarounds.

The way the C++ template instantiation mechanism works imposes some subtle limitations on applicability of the type selection primitive (`if_`), compared to a manually implemented equivalent of the selection code. For example, suppose we are implementing a `pointed_type` traits template such that `pointed_type<T>::type` instantiated for a `T` that is either a plain pointer (`U*`), `std::auto_ptr<U>`, or any of the Boost smart pointers [SPL], e.g. `boost::scoped_ptr<U>`, will give us the pointed type (`U`):

BOOST_MPL_ASSERT_IS_SAME(pointed_type<my*>::type, my); BOOST_MPL_ASSERT_IS_SAME(pointed_type< std::auto_ptr<my> >::type, my); BOOST_MPL_ASSERT_IS_SAME(pointed_type< boost::scoped_ptr<my> >::type, my);

Unfortunately, the straightforward application of `if_` to this problem does not work: ^{1}

template< typename T > struct pointed_type : mpl::if_< boost::is_pointer<T> , typename boost::remove_pointer<T>::type , typename T::element_type // #1 > { }; // the following code causes compilation error in line #1: // name followed by "::" must be a class or namespace name typedef pointed_type<char*>::type result;

Clearly, the expression `typename T::element_type` is not valid in the case of `T == char*`, and that's what the compiler is complaining about. Implementing the selection code manually solves the problem:

namespace aux { // general case template< typename T, bool is_pointer = false > struct select_pointed_type { typedef typename T::element_type type; }; // specialization for plain pointers template< typename T > struct select_pointed_type<T,true> { typedef typename boost::remove_pointer<T>::type type; }; } template< typename T > struct pointed_type : aux::select_pointed_type< T, boost::is_pointer<T>::value > { };

But this quickly becomes awkward if needs to be done repeatedly, and this awkwardness is compounded when partial specialization is not available. We can try to work around the problem as follows:

namespace aux { template< typename T > struct element_type { typedef typename T::element_type type; }; } template< typename T > struct pointed_type { typedef typename mpl::if_< boost::is_pointer<T> , typename boost::remove_pointer<T>::type , typename aux::element_type<T>::type >::type type; };

but this doesn't work either - the access to the `aux::element_type<T>`'s nested `type` member still forces the compiler to instantiate `element_type<T>` with `T == char*`, and that instantiation is, of course, invalid. Also, although in our case this does not lead to a compile error, the `boost::remove_pointer<T>` template always gets instantiated as well, and for the same reason (because we are accessing its nested `type` member). Unnecessary instantiation that is not fatal may or may be not a problem, depending on the ‘weight’ of the template (how much the instantiation taxes the compiler), but a general rule of thumb would be to avoid such code.

Returning to our error, to make the above code compile, we need to factor the act of ‘asking’ `aux::element_type<T>` for its nested `type` out of the `if_` invocation. The fact that both the `boost::remove_pointer<T>` trait template and `aux::element_type<T>` use the same naming convention for their result types makes the refactoring easier:

template< typename T > struct pointed_type { private: typedef typename mpl::if_< boost::is_pointer<T> , boost::remove_pointer<T> , aux::element_type<T> >::type func_; public: typedef typename func_::type type; };

Now the compiler is guaranteed not to instantiate both `boost::remove_pointer<T>` and `aux::element_type<T>`, even although they are used as actual parameters to the `if_` template, so we are allowed to get away with `aux::element_type<char*>` so long as it won't end up being selected as `func_`.

The above technique is so common in template metaprograms, that it even makes sense to facilitate the selection of a nested `type` member by introducing a high-level equivalent to `if_` - the one that will do the `func_::type` operation (that is called [nullary] metafunction class application) as a part of its invocation. The MPL provides such template - it's called `apply_if`. Using it, we can re-write the above code as simple as:

template< typename T > struct pointed_type { typedef typename mpl::apply_if< boost::is_pointer<T> , boost::remove_pointer<T> , aux::element_type<T> >::type type; };

To make our techniques review complete, let's consider a slightly different example - suppose we want to define a high-level wrapper around `boost::remove_pointer` traits template [TTL], which will strip the pointer qualification conditionally. We will call it `remove_pointer_if`:

template< typename Condition , typename T > struct remove_pointer_if { typedef typename mpl::if_< Condition , typename boost::remove_pointer<T>::type , T >::type type; };

Now the above works the first time, but it suffers from the problem we mentioned earlier - `boost::remove_pointer<T>` gets instantiated even if its result is never used. In the metaprogramming world compilation time is an important resource [Abr01], and it is wasted by unnecessary template instantiations. We've just seen how to deal with the problem when both arguments to `if_` are the results of nullary metafunction class applications, but in this example one of the arguments (`T`) is just a simple type, so the refactoring just doesn't seem possible.

The easiest way out of this situation would be to pass to `if_` a real nullary metafunction instead of `T` - the one that returns `T` on its invocation. The MPL provides a simple way to do it - we just substitute `identity<T>` and `apply_if` for `T` and `if_`:

template< typename Condition , typename T > struct remove_pointer_if { typedef typename mpl::apply_if< Condition , boost::remove_pointer<T> , mpl::identity<T> >::type type; };

which gives us exactly what we wanted.

In C++, the basic underlying language construct which allows parameterized compile-time computation is the *class template* ([ISO98], section 14.5.1 [temp.class]). A bare class template is the simplest possible model we could choose for metafunctions: it can take types and/or non-type arguments as actual template parameters, and instantiation ‘returns’ a new type. For example, the following produces a type derived from its arguments:

template< typename T1, typename T2 > struct derive : T1, T2 { };

However, this model is far too limiting: it restricts the metafunction result not only to class types, but to instantiations of a given class template, to say nothing of the fact that every metafunction invocation introduces an additional level of template nesting. While that might be acceptable for this particular metafunction, any model which prevented us from ‘returning’, say, `int` is obviously not general enough. To meet this basic requirement, we must rely on a nested type to provide our return value:

template< typename T1, typename T2 > struct derive { struct type : N1, N2 {}; }; // silly specialization, but demonstrates "returning" int template<> struct derive<void,void> { typedef int type; };

Veldhuizen [Vel95a] was first to talk about class templates of this form as ‘compile-time functions’, and Czarnecki and Eisenecker [CE00] have introduced ‘template metafunction’ as an equivalent term (they also use the simpler term ‘metafunction’, as do we). Czarnecki and Eisenecker have also recognized the limitations of the simple metafunction representation and suggested the form that we discuss in Section 2.2.3.

While syntactically simple, the simple template metafunction form does not always interact optimally with the rest of C++. In particular, the simple metafunction form makes it unnecessarily awkward and tedious to define and work with higher-order metafunctions (metafunctions that operate on other metafunctions). In order to pass a simple metafunction to another template, we need to use *template template parameters*:

// returns F(T1,F(T2,T3)) template< template<typename> class F , typename T1 , typename T2 , typename T3 > struct apply_twice { typedef typename F< T1 , typename F<T2,T3>::type >::type type; }; // a new metafunction returning a type derived from T1, T2, and T3 template< typename T1 , typename T2 , typename T3 > struct derive3 : apply_twice<derive,T1,T2,T3> { };

This looks different, but it seems to work. ^{2} However, things begin to break down noticeably when we want to ‘return’ a metafunction from our metafunction:

// returns G s.t. G(T1,T2,T3) == F(T1,F(T2,T3)) template< template<typename> class F > struct compose_self { template< typename T1 , typename T2 , typename T3 > struct type : apply_twice<F,T1,T2,T3> { }; };

The first and most obvious problem is that the result of applying `compose_self` is not itself a type, but a template, so it can't be passed in the usual ways to other metafunctions. A more subtle issue, however, is that the metafunction ‘returned’ is not exactly what we intended. Although it acts just like `apply_twice`, it differs in one important respect: its identity. In the C++ type system, `compose_self<F>::template type<T,U,V>` is not a synonym for `apply_twice<F,T,U,V>`, and any metaprogram which compared metafunctions would discover that fact.

Because C++ makes a strict distinction between type and class template template parameters, reliance on simple metafunctions creates a ‘wall’ between metafunctions and metadata, relegating metafunctions to the status of second-class citizens. For example, recalling our introduction to type sequences, there's no way to make a `cons` list of metafunctions:

typedef cons<derive, cons<derive3, nil> > derive_functions; // error!

We might consider redefining our `cons` cell so we can pass `derive` as the head element:

template < template< template<typename T, typename U> class F , typename Tail > struct cons;

However, now we have another problem: C++ templates are polymorphic with respect to their type arguments, but not with respect to template template parameters. The arity (number of parameters) of any template template parameter is strictly enforced, so we *still* can't embed `derive3` in a `cons` list. Moreover, polymorphism *between* types and metafunctions is not supported (the compiler expects one or the other), and as we've seen, the syntax and semantics of ‘returned’ metafunctions is different from that of returned types. Trying to accomplish everything with the simple template metafunction form would seriously limit the applicability of higher-order metafunctions and would have an overall negative effect on the both conceptual and implementation clarity, simplicity and size of the library.

Fortunately, the truism that ‘there is no problem in software which can't be solved by adding yet another level of indirection’ applies here. To elevate metafunctions to the status of first-class objects, the MPL introduces the concept of a ‘metafunction class’:

// metafunction class form of derive struct derive { template< typename N1, typename N2 > struct apply { struct type : N1, N2 {}; }; };

This form should look familiar to anyone acquainted with function objects in STL, with the nested `apply` template taking the same role as the runtime function-call operator. In fact, compile-time metafunction classes have the same relationship to metafunctions that runtime function objects have to functions:

// function form of add template< typename T > T add(T x, T y) { return x + y; } // function object form of add struct add { template< typename T > T operator()(T x, T y) { return x + y; } };

The metafunction class form solves all the problems with ordinary template metafunction mentioned earlier: since it is a regular class, it can be placed in compile-time metadata sequences and manipulated by other metafunctions using the same protocols as for any other metadata. We thereby avoid the code-duplication needed to provide versions of each library component to operate on ordinary metadata and on metafunctions with each distinct supported arity.

On the other hand, it seems that accepting metafunction classes as *the* representation for compile-time function entities imposes code duplication danger as well: if the library's own primitives, algorithms, etc. are represented as class templates, that means that one either cannot reuse these algorithms in the context of higher-order functions, or she have to duplicate all algorithms in the second form, so, for instance, there would be two versions of `find`:

// user-friendly form template< typename Sequence , typename T > struct find { typedef /* ... */ type; }; // "metafunction class" form struct find_func { template< typename Sequence, typename T > struct apply { typedef /* ... */ type; }; };

Of course, the third option is to eliminate ‘user-friendly form’ completely so one would always have to write:

typedef mpl::find::apply<list,long>::type iter; // or, if one prefers, // typedef mpl::apply< mpl::find,list,long >::type iter;

instead of

typedef mpl::find<list,long>::type iter;

That too would hurt usability, considering that the direct invocations of library's algorithms are far more often-used than passing algorithms as arguments to other algorithms/metafunctions.

The MPL's answer to this dilemma is *lambda expressions*. Lambda is the mechanism that enables the library to curry metafunctions and convert them into metafunction classes, so when one wants to pass the `find` algorithm as an argument to a higher-order metafunction, she just write:

using namespace mpl::placeholder; typedef mpl::apply< my_f, mpl::find<_1,_2> >::type result;

where `_1` and `_2` are placeholders for the first and second arguments to the resulting metafunction class. This preserves the intuitive syntax below for when the user wants to use `find` directly in her code:

typedef mpl::find<list,long>::type iter;

Lambda facility is described in more details in Section 3.

Compile-time iteration over a sequence (of types) is one of the basic concepts of template metaprogramming. Differences in types of objects being manipulated is the most common point of variability of similar but not identical code/design, and such designs are the direct target for some metaprogramming. Templates were originally designed to solve this exact problem (e.g. `std::vector`). However, without predefined abstractions/constructs for manipulating/iterating over *sequences* of types (as opposed to standalone types), and without known techniques for emulating these constructs using the current language facilities, their effect on helping high-level metaprogramming happen has been limited.

Czarnecki and Eisenecker [CE98], [CE00] were the first to introduce compile-time sequences of types and some simple algorithms on them, although the idea of representing common data structures like trees, lists, etc. at compile time, using class template composition has been around for a while (e.g. most of the expression template libraries build such trees as a part of their expression "parsing" process [Vel95b]). Alexandrescu [Ale01] used lists of types and some algorithms on them to implement several design patterns; the accompanying code is known as the Loki library [Loki].

Most of the algorithms in the Boost Metaprogramming Library operate on sequences. For example, searching for a type in a list looks like this:

typedef mpl::list<char,short,int,long,float,double> types; typedef mpl::find<types,long>::type iter;

Here, `find` accepts two parameters - a sequence to search (`types`) and the type to search for (`long`) - and returns an iterator `iter` pointing to the first element of the sequence such that `iter::type` is identical to `long`. If no such element exists, `iter` is identical to `end<types>::type`. Basically, this is how one would search for a value in a `std::list` or `std::vector`, except that `mpl::find` accepts the sequence as a single parameter, while `std::find` takes two iterators. Everything else is pretty much the same - the names are the same, the semantics are very close, there are iterators, and one can search not only by type, but also by using a predicate:

typedef mpl::find_if< types,boost::is_float<_> >::type iter;

This conceptual/syntactical similarity with the STL is not coincidental. Reusing the conceptual framework of the STL in the compile-time world allows us to apply familiar and sound approaches for dealing with sequential data structures. The algorithms and idioms which programmers already know from the STL can be applied again at compile-time. We consider this to be one of MPL's greatest strengths, distinguishing it from earlier attempts to build a template metaprogramming library.

In the `find` example above, we searched for the type in a sequence built using the `mpl::list` template; but `list` is not the only sequence that the library provides. Neither is `mpl::find` or any other algorithm hard-coded to work only with `list` sequences. `list` is just one model of MPL's Forward Sequence concept, and `find` works with anything that satisfies this concept's requirements. The hierarchy of sequence concepts in MPL is quite simple - a Sequence is any compile-time entity for which `begin<>` and `end<>` produce iterators to the range of its elements; a Forward Sequence is a Sequence whose iterators satisfy Forward Iterator requirements; a Bidirectional Sequence is a Forward Sequence whose iterators satisfy Bidirectional Iterator requirements; finally, a Random Access Sequence is a Bidirectional Sequence whose iterators satisfy Random Access Iterator requirements. ^{3}

Decoupling algorithms from particular sequence implementations (through iterators) allows a metaprogrammer to create her own sequence types and to retain the rest of the library at her disposal. For example, one can define a `tiny_list` for dealing with sequences of three types as follows:

template< typename TinyList, long Pos > struct tiny_list_item; template< typename TinyList, long Pos > struct tiny_list_iterator { typedef typename tiny_list_item<TinyList,Pos>::type type; typedef tiny_list_iterator<TinyList, Pos-1> prior; typedef tiny_list_iterator<TinyList, Pos+1> next; }; template< typename T0, typename T1, typename T2 > struct tiny_list { typedef tiny_list_iterator<tiny_list, 0> begin; typedef tiny_list_iterator<tiny_list, 3> end; typedef T0 type0; typedef T1 type1; typedef T2 type2; }; template< typename TinyList > struct tiny_list_item<TinyList,0> { typedef typename TinyList::type0 type; }; template< typename TinyList > struct tiny_list_item<TinyList,1> { typedef typename TinyList::type1 type; }; template< typename TinyList > struct tiny_list_item<TinyList,2> { typedef typename TinyList::type2 type; };

and then use it with any of the library algorithms as if it were `mpl::list`:

typedef tiny_list< char,short,int > types; typedef mpl::transform< types , boost::add_pointer<_1> >::type pointers;

Note that `tiny_list` is a model of Bidirectional Sequence; it would be a Random Access Sequence if we added `advance` and `distance` members to `tiny_list_iterator`:

template< typename TinyList, long Pos > struct tiny_list_iterator { static long const position = Pos; typedef typename tiny_list_item<TinyList,Pos>::type type; typedef tiny_list_iterator<TinyList, Pos-1> prior; typedef tiny_list_iterator<TinyList, Pos+1> next; template< typename N > struct advance { typedef tiny_list_iterator< TinyList , Pos + N::value > type; }; template< typename Other > struct distance { typedef mpl::integral_c< long , Other::position - position > type; }; };

While the `tiny_list` itself might be not that interesting (after all, it can hold only three elements), if the technique above could be automated so we would be able to define not-so-tiny sequences (with five, ten, twenty, etc. elements), it would be very valuable. ^{4}

External code generation is an option, but there exists a solution within the language. However, it is not a template metaprogramming, but rather *preprocessor metaprogramming*. In fact, MPL's `vector` - a fixed-size type sequence that provides random-access iterators - is implemented very much like the above `tiny_list` - using the Boost Preprocessor library [PRE].

So, the library provides its users with almost complete compile-time equivalent of the STL framework. Does it help them to solve their metaprogramming tasks? Let's return to our earlier `largest` example to see if we can rewrite it in a better way with what MPL has to offer. Well, actually, there is not much to look at, because the MPL implementation is a one-liner (we'll spread it out here for readability) ^{5} :

template< typename Sequence > struct largest { typedef typename mpl::max_element< Sequence mpl::less< mpl::size_of<_1> , mpl::size_of<_2> > >::type iter; typedef typename iter::type type; };

There are no more termination conditions with tricky pattern matching, no more partial specializations; and even more importantly, it's *obvious* what the above code does - even although it's all templates - something that one could not say about the original version.

For the purpose of examining a little bit more of the library's internal structure, let's look at how `max_element` from the above example is implemented. One might expect that *now* we will again see all these awkward partial specializations, esoteric pattern matching, etc. Well, let's see:

template< typename Sequence , typename Predicate > struct max_element { typedef typename mpl::iter_fold< Sequence , typename mpl::begin<Sequence>::type , if_< less< deref<_1>,deref<_2> >, _2, _1 > >::type type; };

The first thing to notice here is that this algorithm is implemented in terms of another one: `iter_fold`. In fact, this is probably the most important point of the example, because nearly all other generic sequence algorithms in the library are implemented in terms of `iter_fold`. If a user should ever need to implement her own sequence algorithm, she'll almost certainly be able to do so using this primitive, which means she won't have to resort to implementing hand-crafted iteration, pattern matching of special cases for loop termination, or workarounds for lack of partial specialization. It also means that her algorithm will automatically benefit from any optimizations the library has implemented, (e.g. recursion unrolling), and that it will work with any sequence that is a model of ForwardSequence, because `iter_fold` does not require anything more of its sequence argument.

`iter_fold` algorithm is basically a compile-time equivalent of the `fold` or `reduce` functions that comprise the basic and well-known primitives of many functional programming languages. An analogy more familiar to a C++ programmer would be the `std::accumulate` algorithm from the C++ standard library ([ISO98], section 26.4.1 [lib.accumulate]). However, `iter_fold` is designed to take advantage of the natural characteristics of recursive traversal: it accepts *two* metafunction class arguments, the first of which is applied to the state "on the way in" and the second of which is applied "on the way out".

The interface to `iter_fold` is defined in MPL as follows:

template< typename Sequence , typename InitialState , typename ForwardOp , typename BackwardOp = _1 > struct iter_fold { typedef /*unspecified*/ type; };

The algorithm ‘returns’ the result of two-way successive applications of binary `ForwardOp` and `BackwardOp` operations to iterators in range [`begin<Sequence>::type`, `end<Sequence>::type`) and previous result of an operation; the `InitialState` is logically placed before the sequence and included in the forward traversal. The result `type` is identical to `InitialState` if the sequence is empty.

The library also provides `iter_fold_backward`, `fold`, and `fold_backward` algorithms which wrap `iter_fold` to accommodate its most common usage patterns.

What we've seen so far were sequences (and algorithms on sequences) of types. It is both possible and easy to manipulate compile-time *values* using the library as well. The only thing to remember is that in C++, class template non-type template parameters give us one more example of non-polymorphic behavior. In other words, if one declared a metafunction to take a non-type template parameter (e.g. `long`) it's not possible to pass anything besides compile-time integral constants to it:

template< long N1, long N2 > struct equal_to { static bool const value = (N1 == N2); }; equal_to<5,5>::value; // ok equal_to<int,int>::value; // error!

And of course this doesn't work the other way around either:

typedef mpl::list<1,2,3,4,5> numbers; // error!

While this may be an obvious limitation, it imposes yet another dilemma on the library design: on the one hand, we don't want to restrict users to type manipulations only, and on the other hand, full support for integral manipulations would require at least duplication of most of the library facilities ^{6} - the same situation as we would have if we had chosen to represent metafunctions as ordinary class templates. The solution for this issue is the same as well: we represent integral values by wrapping them in types ^{7} . For example, to create a list of numbers one can write:

typedef mpl::list< mpl::int_c<1> , mpl::int_c<2> , mpl::int_c<3> , mpl::int_c<4> , mpl::int_c<5> > numbers;

Wrapping integral constants into types to make them first-class citizens is important well inside metaprograms, where one often doesn't know (and doesn't care) if the metafunctions she is using operate on types, integral values, other metafunctions, or something else, like fixed-point or rational numbers (`mpl::fixed_c` and `mpl::rational_c`).

But, from the user's perspective, the above example is much more verbose than the shorter, incorrect one. Thus, for the purpose of convenience, the library does provide users with a template that takes non-type template parameters, but offers a more compact notation:

typedef mpl::list_c<long,1,2,3,4,5> numbers;

There is a similar `vector` counterpart as well:

typedef mpl::vector_c<long,1,2,3,4,5> numbers;

Previous efforts to provide generalized metaprogramming facilities for C++ have always concentrated on `cons`-style type lists and a few core algorithms like `size` and `at`, which are tied to the specific sequence implementation. Such systems have an elegant simplicity reminiscent of the analogous functionality in pure functional Lisp. It is much more time-consuming to implement even a basic set of the sequence algorithms provided by equivalent run-time libraries (the STL in particular), but if we have learned anything from the STL, it is that tying those algorithms' implementations to a specific sequence implementation is a misguided effort!

The truth is that there is no single ‘best’ type sequence implementation for the same reasons that there will never be a single ‘best’ runtime sequence implementation. Furthermore, there are *already* quite a number of type list implementations in use today; and just as the STL algorithms can operate on sequences which don't come from STL containers, so the MPL algorithms are designed to work with foreign type sequences.

It may be an eye-opening fact for some that type lists are not the only useful compile-time sequence. Again, the need for a variety of compile-time containers arises for the same reasons that we have lists, vectors, deques, and sets in the C++ standard library - different containers have different functional and performance characteristics which determine not only applicability and efficiency of particular algorithms, but also the expressiveness or verbosity of the code that uses them. While runtime performance is not an issue for C++ metaprograms, compilation speed is often a significant bottleneck to advanced C++ software development [Abr01].

The MPL provides five built-in sequences: `list`, `list_c` (really just a `list` of value wrappers), `vector`, a randomly-accessible sequence of fixed maximum size, `vector_c`, and `range_c`, a randomly-accessible sequence of consecutive integral values. More important, however, is its ability to adapt to arbitrary sequence types. The only core operations that a sequence is required to provide in order to be used with the library algorithms are `begin<>` and `end<>` metafunctions which "return" iterators into the sequence. As with the STL, it is the iterators which are used to implement most of the general-purpose sequence algorithms the library provides. Also, as with the STL, algorithm specialization is used to take advantage of implementation knowledge about particular sequences: many of the "basic" sequence operations such as `back<>`, `front<>`, `size<>`, and `at<>` are specialized on sequence type to provide a more efficient implementation than the fully generic version.

Almost coincidentally, loop unrolling can be as important to compile-time iterative algorithms as it is to runtime algorithms. To see why, one must first remember that all "loops" in C++ metaprograms, are in fact, implemented with recursion, and that the template instantiation depth can be a valuable resource in a compiler implementation. In fact, Annex B of the C++ standard ([ISO98], annex B [limits]) *recommends* a minimum depth of 17 recursively nested template instantiations; but this is far too low for many serious metaprograms, some of which easily exceed the hard-coded instantiation limits of some otherwise excellent compilers. To see how this works in action, let's examine a straightforward implementation of the `fold` metafunction, which combines some algorithm state with each element of a sequence:

namespace aux { // unspecialized version combines the initial state and first element // and recurses to process the rest template< typename Start , typename Finish , typename State , typename BinaryFunction > struct fold_impl : fold_impl< typename Start::next , Finish , typename apply< BinaryFunction , State , typename Start::type >::type , BinaryFunction > { }; // specialization for loop termination template< typename Finish , typename State , typename BinaryFunction > struct fold_impl<Finish,Finish,State,BinaryFunction> { typedef State type; }; } // namespace aux // public interface template< typename Sequence , typename State , typename ForwardOp > struct fold : aux::fold_impl< , typename begin<Sequence>::type , typename end<Sequence>::type , State , typename lambda<ForwardOp>::type > { };

Although simple and elegant, this implementation will always incur at least as many levels of recursive template instantiation as there are elements in the input sequence. ^{8} The library addresses this problem by explicitly "unrolling" the recursion. To apply the technique to our `fold` example, we begin by factoring out a single step of the algorithm. Our `fold_impl_step` metafunction has two results: `type` (the next state), and `iterator` (the next sequence position).

template< typename BinaryFunction , typename State , typename Start , typename Finish > struct fold_impl_step { typedef typename apply< BinaryFunction , State , typename Start::type >::type type; typedef typename Start::next iterator; };

As with our main algorithm implementation, we specialize for the loop termination condition so that the step becomes a no-op:

template< typename BinaryFunction , typename State , typename Finish > struct fold_impl_step<BinaryFunction,State,Finish,Finish> { typedef State type; typedef Finish iterator; };

Now we can now reduce `fold`'s instantiation depth by any constant factor N simply by inserting N invocations of `fold_impl_step`. Here we've chosen a factor of 4:

template< typename Start , typename Finish , typename State , typename BinaryFunction > struct fold_impl { private: typedef fold_impl_step< BinaryFunction , State , Start , Finish > next1; typedef fold_impl_step< BinaryFunction , typename next1::type , typename next1::iterator , Finish > next2; typedef fold_impl_step< BinaryFunction , typename next2::type , typename next2::iterator , Finish > next3; typedef fold_impl_step< BinaryFunction , typename next3::type , typename next3::iterator , Finish > next4; typedef fold_impl_step< typename next4::iterator , Finish , typename next4::type , BinaryFunction > recursion; public: typedef typename recursion::type type; };

The MPL applies this unrolling technique across all algorithms with an unrolling factor tuned according to the demands of the C++ implementation in use, and with an option for the user to override the value. ^{9} This fact enables users to push beyond the metaprogramming limits they would usually encounter with more naive algorithm implementations. Experiments also show a small (up to 10%) increase in metaprogram instantiation speed on some compilers when loop unrolling is used.