Porting from Spirit 1.8.x

The current version of Spirit is a complete rewrite of earlier versions (we refer to earlier versions as Spirit.Classic). The parser generators are now only one part of the whole library. The parser submodule of Spirit is now called Spirit.Qi. It is conceptually different and exposes a completely different interface. Generally, there is no easy (or automated) way of converting parsers written for Spirit.Classic to Spirit.Qi. Therefore this section can give only guidelines on how to approach porting your older parsers to the current version of Spirit.

Include Files

The overall directory structure of the Spirit directories is described in the section Include Structure and the FAQ entry Header Hell. This should give you a good overview on how to find the needed header files for your new parsers. Moreover, each section in the Qi Reference lists the required include files needed for any particular component.

It is possible to tell from the name of a header file, what version it belongs to. While all main include files for Spirit.Classic have the string 'classic_' in their name, for instance:

#include <boost/spirit/include/classic_core.hpp>

we named all main include files for Spirit.Qi to have the string 'qi_' as part of their name, for instance:

#include <boost/spirit/include/qi_core.hpp>

The following table gives a rough list of corresponding header file between Spirit.Classic and Spirit.Qi, but this can be used as a starting point only, as several components have either been moved to different submodules or might not exist in the never version anymore. We list only include files for the topmost submodules. For header files required for more lower level components please refer to the corresponding reference documentation of this component.

Include file in Spirit.Classic	Include file in Spirit.Qi
`classic.hpp`	`qi.hpp`
`classic_actor.hpp`	none, use Boost.Phoenix for writing semantic actions
`classic_attribute.hpp`	none, use local variables for rules instead of closures, the primitives parsers now directly support lazy parameterization
`classic_core.hpp`	`qi_core.hpp`
`classic_debug.hpp`	`qi_debug.hpp`
`classic_dynamic.hpp`	none, use Spirit.Qi predicates instead of if_p, while_p, for_p (included by `qi_core.hpp`), the equivalent for lazy_p is now included by `qi_auxiliary.hpp`
`classic_error_handling.hpp`	none, included in `qi_core.hpp`
`classic_meta.hpp`	none
`classic_symbols.hpp`	none, included in `qi_core.hpp`
`classic_utility.hpp`	none, not part of Spirit.Qi anymore, these components will be added over time to the Repository

The Free Parse Functions

The free parse functions (i.e. the main parser API) has been changed. This includes the names of the free functions as well as their interface. In Spirit.Classic all free functions were named parse. In Spirit.Qi they are are named either qi::parse or qi::phrase_parse depending on whether the parsing should be done using a skipper (qi::phrase_parse) or not (qi::parse). All free functions now return a simple bool. A returned true means success (i.e. the parser has matched) or false (i.e. the parser didn't match). This is equivalent to the former old parse_info member hit. Spirit.Qi doesn't support tracking of the matched input length anymore. The old parse_info member full can be emulated by comparing the iterators after qi::parse returned.

All code examples in this section assume the following include statements and using directives to be inserted. For Spirit.Classic:

#include <boost/spirit/include/classic.hpp>
#include <boost/spirit/include/phoenix1.hpp>
#include <iostream>
#include <string>

using namespace boost::spirit::classic;

and for Spirit.Qi:

#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/operator.hpp>
#include <iostream>
#include <string>
#include <algorithm>

using namespace boost::spirit;

The following similar examples should clarify the differences. First the base example in Spirit.Classic:

std::string input("1,1");
parse_info<std::string::iterator> pi = parse(input.begin(), input.end(), int_p);

if (pi.hit)
    std::cout << "successful match!\n";

if (pi.full)
    std::cout << "full match!\n";
else
    std::cout << "stopped at: " << std::string(pi.stop, input.end()) << "\n";

std::cout << "matched length: " << pi.length << "\n";

And here is the equivalent piece of code using Spirit.Qi:

std::string input("1,1");
std::string::iterator it = input.begin();
bool result = qi::parse(it, input.end(), qi::int_);

if (result)
    std::cout << "successful match!\n";

if (it == input.end())
    std::cout << "full match!\n";
else
    std::cout << "stopped at: " << std::string(it, input.end()) << "\n";

// seldomly needed: use std::distance to calculate the length of the match
std::cout << "matched length: " << std::distance(input.begin(), it) << "\n";

The changes required for phrase parsing (i.e. parsing using a skipper) are similar. Here is how phrase parsing works in Spirit.Classic:

std::string input(" 1, 1");
parse_info<std::string::iterator> pi = parse(input.begin(), input.end(), int_p, space_p);

if (pi.hit)
    std::cout << "successful match!\n";

if (pi.full)
    std::cout << "full match!\n";
else
    std::cout << "stopped at: " << std::string(pi.stop, input.end()) << "\n";

std::cout << "matched length: " << pi.length << "\n";

And here the equivalent example in Spirit.Qi:

std::string input(" 1, 1");
std::string::iterator it = input.begin();
bool result = qi::phrase_parse(it, input.end(), qi::int_, ascii::space);

if (result)
    std::cout << "successful match!\n";

if (it == input.end())
    std::cout << "full match!\n";
else
    std::cout << "stopped at: " << std::string(it, input.end()) << "\n";

// seldomly needed: use std::distance to calculate the length of the match
std::cout << "matched length: " << std::distance(input.begin(), it) << "\n";

Note, how character parsers are in a separate namespace (here boost::spirit::ascii::space) as Spirit.Qi now supports working with different character sets. See the section Character Encoding Namespace for more information.

Naming Conventions

In Spirit.Classic all parser primitives have suffixes appended to their names, encoding their type: "_p" for parsers, "_a" for lazy actions, "_d" for directives, etc. In Spirit.Qi we don't have anything similar. The only suffixes are single underscore letters "_" applied where the name would otherwise conflict with a keyword or predefined name (such as int_ for the integer parser). Overall, most, if not all primitive parsers and directives have been renamed. Please see the Qi Quick Reference for an overview on the names for the different available parser primitives, directives and operators.

Parser Attributes

In Spirit.Classic most of the parser primitives don't expose a specific attribute type. Most parsers expose the pair of iterators pointing to the matched input sequence. As in Spirit.Qi all parsers expose a parser specific attribute type it introduces a special directive raw[] allowing to achieve a similar effect as in Spirit.Classic. The raw[] directive exposes the pair of iterators pointing to the matching sequence of its embedded parser. Even if we very much encourage you to rewrite your parsers to take advantage of the generated parser specific attributes, sometimes it is helpful to get access to the underlying matched input sequence.

Grammars and Rules

The grammar<> and rule<> types are of equal importance to Spirit.Qi as they are for Spirit.Classic. Their main purpose is still the same: they allow to define non-terminals and they are the main building blocks for more complex parsers. Nevertheless, both types have been redesigned and their interfaces have changed. Let's have a look at two examples first, we'll explain the differences afterwards. Here is a simple grammar and its usage in Spirit.Classic:

struct roman : public grammar<roman>
{
    template <typename ScannerT>
    struct definition
    {
        definition(roman const& self)
        {
            hundreds.add
                ("C"  , 100)("CC"  , 200)("CCC"  , 300)("CD" , 400)("D" , 500)
                ("DC" , 600)("DCC" , 700)("DCCC" , 800)("CM" , 900) ;

            tens.add
                ("X"  , 10)("XX"  , 20)("XXX"  , 30)("XL" , 40)("L" , 50)
                ("LX" , 60)("LXX" , 70)("LXXX" , 80)("XC" , 90) ;

            ones.add
                ("I"  , 1)("II"  , 2)("III"  , 3)("IV" , 4)("V" , 5)
                ("VI" , 6)("VII" , 7)("VIII" , 8)("IX" , 9) ;

            first = eps_p         [phoenix::var(self.r) = phoenix::val(0)]
                >>  (  +ch_p('M') [phoenix::var(self.r) += phoenix::val(1000)]
                    ||  hundreds  [phoenix::var(self.r) += phoenix::_1]
                    ||  tens      [phoenix::var(self.r) += phoenix::_1]
                    ||  ones      [phoenix::var(self.r) += phoenix::_1]
                    ) ;
        }

        rule<ScannerT> first;
        symbols<unsigned> hundreds;
        symbols<unsigned> tens;
        symbols<unsigned> ones;

        rule<ScannerT> const& start() const { return first; }
    };

    roman(unsigned& r_) : r(r_) {}
    unsigned& r;
};

std::string input("MMIX");        // MMIX == 2009
unsigned value = 0;
roman r(value);
parse_info<std::string::iterator> pi = parse(input.begin(), input.end(), r);
if (pi.hit)
    std::cout << "successfully matched: " << value << "\n";

And here is a similar grammar and its usage in Spirit.Qi:

template <typename Iterator>
struct roman : qi::grammar<Iterator, unsigned()>
{
    roman() : roman::base_type(first)
    {
        hundreds.add
            ("C"  , 100)("CC"  , 200)("CCC"  , 300)("CD" , 400)("D" , 500)
            ("DC" , 600)("DCC" , 700)("DCCC" , 800)("CM" , 900) ;

        tens.add
            ("X"  , 10)("XX"  , 20)("XXX"  , 30)("XL" , 40)("L" , 50)
            ("LX" , 60)("LXX" , 70)("LXXX" , 80)("XC" , 90) ;

        ones.add
            ("I"  , 1)("II"  , 2)("III"  , 3)("IV" , 4)("V" , 5)
            ("VI" , 6)("VII" , 7)("VIII" , 8)("IX" , 9) ;

        // qi::_val refers to the attribute of the rule on the left hand side 
        first = eps          [qi::_val = 0]
            >>  (  +lit('M') [qi::_val += 1000]
                ||  hundreds [qi::_val += qi::_1]
                ||  tens     [qi::_val += qi::_1]
                ||  ones     [qi::_val += qi::_1]
                ) ;
    }

    qi::rule<Iterator, unsigned()> first;
    qi::symbols<char, unsigned> hundreds;
    qi::symbols<char, unsigned> tens;
    qi::symbols<char, unsigned> ones;
};

std::string input("MMIX");        // MMIX == 2009
std::string::iterator it = input.begin();
unsigned value = 0;
roman<std::string::iterator> r;
if (qi::parse(it, input.end(), r, value))
    std::cout << "successfully matched: " << value << "\n";

Both versions look similar enough, but we see several differences (we will cover each of those differences in more detail below):

Neither the grammars nor the rules depend on a scanner type anymore, both depend only on the underlying iterator type. That means the dreaded scanner business is no issue anymore!
Grammars have no embedded class definition anymore
Grammars and rules may have an explicit attribute type specified in their definition
Grammars do not have any explicit start rules anymore. Instead one of the contained rules is used as a start rule by default.

The first two points are tightly interrelated. The scanner business (see the FAQ number one of Spirit.Classic here: The Scanner Business) has been a problem for a long time. The grammar and rule types have been specifically redesigned to avoid this problem in the future. This also means that we don't need any delayed instantiation of the inner definition class in a grammar anymore. So the redesign not only helped fixing a long standing design problem, it helped to simplify things considerably.

All Spirit.Qi parser components have well defined attribute types. Grammars and rules are no exception. But since both need to be generic enough to be usable for any parser their attribute type has to be explicitly specified. In the example above the roman grammar and the rule first both have an unsigned attribute:

// grammar definition
template <typename Iterator>
struct roman : qi::grammar<Iterator, unsigned()> {...};

// rule definition
qi::rule<Iterator, unsigned()> first;

The used notation resembles the definition of a function type. This is very natural as you can think of the synthesized attribute of the grammar and the rule as of its 'return value'. In fact the rule and the grammar both 'return' an unsigned value - the value they matched.

	Note
	The function type notation allows to specify parameters as well. These are interpreted as the types of inherited attributes the rule or grammar expect to be passed during parsing. For more information please see the section about inherited and synthesized attributes for rules and grammars (Attributes).

If no attribute is desired none needs to be specified. The default attribute type for both, grammars and rules, is unused_type, which is a special placeholder type. Generally, using unused_type as the attribute of a parser is interpreted as 'this parser has no attribute'. This is mostly used for parsers applied to parts of the input not carrying any significant information, rather being delimiters or structural elements needed for correct interpretation of the input.

The last difference might seem to be rather cosmetic and insignificant. But it turns out that not having to specify which rule in a grammar is the start rule (by returning it from the function start()) also means that any rule in a grammar can be directly used as the start rule. Nevertheless, the grammar base class gets initialized with the rule it has to use as the start rule in case the grammar instance is directly used as a parser.

Boost C++ Libraries