...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
The current version of Spirit is a complete rewrite of earlier versions (we refer to earlier versions as Spirit.Classic). The parser generators are now only one part of the whole library. The parser submodule of Spirit is now called Spirit.Qi. It is conceptually different and exposes a completely different interface. Generally, there is no easy (or automated) way of converting parsers written for Spirit.Classic to Spirit.Qi. Therefore this section can give only guidelines on how to approach porting your older parsers to the current version of Spirit.
The overall directory structure of the Spirit directories is described in the section Include Structure and the FAQ entry Header Hell. This should give you a good overview on how to find the needed header files for your new parsers. Moreover, each section in the Qi Reference lists the required include files needed for any particular component.
It is possible to tell from the name of a header file, what version it belongs to. While all main include files for Spirit.Classic have the string 'classic_' in their name, for instance:
#include <boost/spirit/include/classic_core.hpp>
we named all main include files for Spirit.Qi to have the string 'qi_' as part of their name, for instance:
#include <boost/spirit/include/qi_core.hpp>
The following table gives a rough list of corresponding header file between Spirit.Classic and Spirit.Qi, but this can be used as a starting point only, as several components have either been moved to different submodules or might not exist in the never version anymore. We list only include files for the topmost submodules. For header files required for more lower level components please refer to the corresponding reference documentation of this component.
Include file in Spirit.Classic |
Include file in Spirit.Qi |
---|---|
|
|
|
none, use Boost.Phoenix for writing semantic actions |
|
none, use local variables for rules instead of closures, the primitives parsers now directly support lazy parameterization |
|
|
|
|
|
none, use Spirit.Qi predicates instead of
if_p, while_p, for_p (included by |
|
none, included in |
|
none |
|
none, included in |
|
none, not part of Spirit.Qi anymore, these components will be added over time to the Repository |
The free parse functions (i.e. the main parser API) has been changed. This
includes the names of the free functions as well as their interface. In
Spirit.Classic
all free functions were named parse
.
In Spirit.Qi they are are named either qi::parse
or qi::phrase_parse
depending on whether the parsing
should be done using a skipper (qi::phrase_parse
)
or not (qi::parse
). All free functions now return a
simple bool
. A returned true
means success (i.e. the parser has matched)
or false
(i.e. the parser didn't
match). This is equivalent to the former old parse_info
member hit
. Spirit.Qi
doesn't support tracking of the matched input length anymore. The old parse_info
member full
can be emulated by comparing the iterators after qi::parse
returned.
All code examples in this section assume the following include statements and using directives to be inserted. For Spirit.Classic:
#include <boost/spirit/include/classic.hpp> #include <boost/spirit/include/phoenix1.hpp> #include <iostream> #include <string>
using namespace boost::spirit::classic;
and for Spirit.Qi:
#include <boost/spirit/include/qi.hpp> #include <boost/phoenix/operator.hpp> #include <iostream> #include <string> #include <algorithm>
using namespace boost::spirit;
The following similar examples should clarify the differences. First the base example in Spirit.Classic:
std::string input("1,1"); parse_info<std::string::iterator> pi = parse(input.begin(), input.end(), int_p); if (pi.hit) std::cout << "successful match!\n"; if (pi.full) std::cout << "full match!\n"; else std::cout << "stopped at: " << std::string(pi.stop, input.end()) << "\n"; std::cout << "matched length: " << pi.length << "\n";
And here is the equivalent piece of code using Spirit.Qi:
std::string input("1,1"); std::string::iterator it = input.begin(); bool result = qi::parse(it, input.end(), qi::int_); if (result) std::cout << "successful match!\n"; if (it == input.end()) std::cout << "full match!\n"; else std::cout << "stopped at: " << std::string(it, input.end()) << "\n"; // seldomly needed: use std::distance to calculate the length of the match std::cout << "matched length: " << std::distance(input.begin(), it) << "\n";
The changes required for phrase parsing (i.e. parsing using a skipper) are similar. Here is how phrase parsing works in Spirit.Classic:
std::string input(" 1, 1"); parse_info<std::string::iterator> pi = parse(input.begin(), input.end(), int_p, space_p); if (pi.hit) std::cout << "successful match!\n"; if (pi.full) std::cout << "full match!\n"; else std::cout << "stopped at: " << std::string(pi.stop, input.end()) << "\n"; std::cout << "matched length: " << pi.length << "\n";
And here the equivalent example in Spirit.Qi:
std::string input(" 1, 1"); std::string::iterator it = input.begin(); bool result = qi::phrase_parse(it, input.end(), qi::int_, ascii::space); if (result) std::cout << "successful match!\n"; if (it == input.end()) std::cout << "full match!\n"; else std::cout << "stopped at: " << std::string(it, input.end()) << "\n"; // seldomly needed: use std::distance to calculate the length of the match std::cout << "matched length: " << std::distance(input.begin(), it) << "\n";
Note, how character parsers are in a separate namespace (here boost::spirit::ascii::space
) as Spirit.Qi
now supports working with different character sets. See the section Character
Encoding Namespace for more information.
In Spirit.Classic
all parser primitives have suffixes appended to their names, encoding their
type: "_p"
for parsers,
"_a"
for lazy actions,
"_d"
for directives,
etc. In Spirit.Qi we don't have anything similar. The
only suffixes are single underscore letters "_"
applied where the name would otherwise conflict with a keyword or predefined
name (such as int_
for the
integer parser). Overall, most, if not all primitive parsers and directives
have been renamed. Please see the Qi
Quick Reference for an overview on the names for the different available
parser primitives, directives and operators.
In Spirit.Classic
most of the parser primitives don't expose a specific attribute type. Most
parsers expose the pair of iterators pointing to the matched input sequence.
As in Spirit.Qi all parsers expose a parser specific
attribute type it introduces a special directive raw
[]
allowing to achieve a similar effect as in Spirit.Classic.
The raw
[]
directive exposes the pair of iterators
pointing to the matching sequence of its embedded parser. Even if we very
much encourage you to rewrite your parsers to take advantage of the generated
parser specific attributes, sometimes it is helpful to get access to the
underlying matched input sequence.
The grammar<>
and rule<>
types are of equal importance to Spirit.Qi as they are
for Spirit.Classic.
Their main purpose is still the same: they allow to define non-terminals
and they are the main building blocks for more complex parsers. Nevertheless,
both types have been redesigned and their interfaces have changed. Let's
have a look at two examples first, we'll explain the differences afterwards.
Here is a simple grammar and its usage in Spirit.Classic:
struct roman : public grammar<roman> { template <typename ScannerT> struct definition { definition(roman const& self) { hundreds.add ("C" , 100)("CC" , 200)("CCC" , 300)("CD" , 400)("D" , 500) ("DC" , 600)("DCC" , 700)("DCCC" , 800)("CM" , 900) ; tens.add ("X" , 10)("XX" , 20)("XXX" , 30)("XL" , 40)("L" , 50) ("LX" , 60)("LXX" , 70)("LXXX" , 80)("XC" , 90) ; ones.add ("I" , 1)("II" , 2)("III" , 3)("IV" , 4)("V" , 5) ("VI" , 6)("VII" , 7)("VIII" , 8)("IX" , 9) ; first = eps_p [phoenix::var(self.r) = phoenix::val(0)] >> ( +ch_p('M') [phoenix::var(self.r) += phoenix::val(1000)] || hundreds [phoenix::var(self.r) += phoenix::_1] || tens [phoenix::var(self.r) += phoenix::_1] || ones [phoenix::var(self.r) += phoenix::_1] ) ; } rule<ScannerT> first; symbols<unsigned> hundreds; symbols<unsigned> tens; symbols<unsigned> ones; rule<ScannerT> const& start() const { return first; } }; roman(unsigned& r_) : r(r_) {} unsigned& r; };
std::string input("MMIX"); // MMIX == 2009 unsigned value = 0; roman r(value); parse_info<std::string::iterator> pi = parse(input.begin(), input.end(), r); if (pi.hit) std::cout << "successfully matched: " << value << "\n";
And here is a similar grammar and its usage in Spirit.Qi:
template <typename Iterator> struct roman : qi::grammar<Iterator, unsigned()> { roman() : roman::base_type(first) { hundreds.add ("C" , 100)("CC" , 200)("CCC" , 300)("CD" , 400)("D" , 500) ("DC" , 600)("DCC" , 700)("DCCC" , 800)("CM" , 900) ; tens.add ("X" , 10)("XX" , 20)("XXX" , 30)("XL" , 40)("L" , 50) ("LX" , 60)("LXX" , 70)("LXXX" , 80)("XC" , 90) ; ones.add ("I" , 1)("II" , 2)("III" , 3)("IV" , 4)("V" , 5) ("VI" , 6)("VII" , 7)("VIII" , 8)("IX" , 9) ; // qi::_val refers to the attribute of the rule on the left hand side first = eps [qi::_val = 0] >> ( +lit('M') [qi::_val += 1000] || hundreds [qi::_val += qi::_1] || tens [qi::_val += qi::_1] || ones [qi::_val += qi::_1] ) ; } qi::rule<Iterator, unsigned()> first; qi::symbols<char, unsigned> hundreds; qi::symbols<char, unsigned> tens; qi::symbols<char, unsigned> ones; };
std::string input("MMIX"); // MMIX == 2009 std::string::iterator it = input.begin(); unsigned value = 0; roman<std::string::iterator> r; if (qi::parse(it, input.end(), r, value)) std::cout << "successfully matched: " << value << "\n";
Both versions look similar enough, but we see several differences (we will cover each of those differences in more detail below):
definition
anymore
The first two points are tightly interrelated. The scanner business (see the FAQ number one of Spirit.Classic here: The Scanner Business) has been a problem for a long time. The grammar and rule types have been specifically redesigned to avoid this problem in the future. This also means that we don't need any delayed instantiation of the inner definition class in a grammar anymore. So the redesign not only helped fixing a long standing design problem, it helped to simplify things considerably.
All Spirit.Qi parser components have well defined attribute
types. Grammars and rules are no exception. But since both need to be generic
enough to be usable for any parser their attribute type has to be explicitly
specified. In the example above the roman
grammar and the rule first
both have an unsigned
attribute:
// grammar definition template <typename Iterator> struct roman : qi::grammar<Iterator, unsigned()> {...}; // rule definition qi::rule<Iterator, unsigned()> first;
The used notation resembles the definition of a function type. This is very natural as you can think of the synthesized attribute of the grammar and the rule as of its 'return value'. In fact the rule and the grammar both 'return' an unsigned value - the value they matched.
Note | |
---|---|
The function type notation allows to specify parameters as well. These are interpreted as the types of inherited attributes the rule or grammar expect to be passed during parsing. For more information please see the section about inherited and synthesized attributes for rules and grammars (Attributes). |
If no attribute is desired none needs to be specified. The default attribute
type for both, grammars and rules, is unused_type
,
which is a special placeholder type. Generally, using unused_type
as the attribute of a parser is interpreted as 'this parser has no attribute'.
This is mostly used for parsers applied to parts of the input not carrying
any significant information, rather being delimiters or structural elements
needed for correct interpretation of the input.
The last difference might seem to be rather cosmetic and insignificant. But
it turns out that not having to specify which rule in a grammar is the start
rule (by returning it from the function start()
) also means that any rule in a grammar
can be directly used as the start rule. Nevertheless, the grammar base class
gets initialized with the rule it has to use as the start rule in case the
grammar instance is directly used as a parser.