...one of the most highly
regarded and expertly designed C++ library projects in the
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
This section is not for the faint of heart. In here, are distilled the inner workings of Spirit.Qi parsers, using real code from the Spirit library as examples. On the other hand, here is no reason to fear reading on, though. We tried to explain things step by step while highlighting the important insights.
class is the base
class for all parsers.
template <typename Derived> struct parser { struct parser_id; typedef Derived derived_type; typedef qi::domain domain; // Requirement: p.parse(f, l, context, skip, attr) -> bool // // p: a parser // f, l: first/last iterator pair // context: enclosing rule context (can be unused_type) // skip: skipper (can be unused_type) // attr: attribute (can be unused_type) // Requirement: p.what(context) -> info // // p: a parser // context: enclosing rule context (can be unused_type) // Requirement: P::template attribute<Ctx, Iter>::type // // P: a parser type // Ctx: A context type (can be unused_type) // Iter: An iterator type (can be unused_type) Derived const& derived() const { return *static_cast<Derived const*>(this); } };
class does not really
know how to parse anything but instead relies on the template parameter
to do the actual
parsing. This technique is known as the "Curiously Recurring Template
Pattern" in template meta-programming circles. This inheritance strategy
gives us the power of polymorphism without the virtual function overhead.
In essence this is a way to implement compile time polymorphism.
The Derived parsers,
, PrimitiveParser
, UnaryParser
and BinaryParser
provide the
necessary facilities for parser detection, introspection, transformation
and visitation.
Derived parsers must support the following:
bool parse(f, l, context, skip, attr)
, l
first/last iterator pair
enclosing rule context (can be unused_type)
skipper (can be unused_type)
attribute (can be unused_type)
The parse is the main parser entry point. skipper
can be an unused_type
It's a type used every where in Spirit
to signify "don't-care". There is an overload for skip
for unused_type
that is
simply a no-op. That way, we do not have to write multiple parse functions
for phrase and character level parsing.
Here are the basic rules for parsing:
if successful, false
is incremented N number of times, where N is the number of characters
parsed. N can be zero --an empty (epsilon) match.
is reset to its position before entering the parser function. attr
is untouched.
void what(context)
enclosing rule context (can be unused_type
The what function should be obvious. It provides some information about “what” the parser is. It is used as a debugging aid, for example.
P::template attribute<context>::type
a parser type
A context type (can be unused_type)
The attribute metafunction returns the expected attribute type of the parser. In some cases, this is context dependent.
In this section, we will dissect two parser types:
A parser for primitive data (e.g. integer parsing).
A parser that has single subject (e.g. kleene star).
For our disection study, we will use a Spirit
primitive, the int_parser
in the boost::spirit::qi namespace.
template < typename T , unsigned Radix = 10 , unsigned MinDigits = 1 , int MaxDigits = -1> struct int_parser_impl : primitive_parser<int_parser_impl<T, Radix, MinDigits, MaxDigits> > { // check template parameter 'Radix' for validity BOOST_SPIRIT_ASSERT_MSG( Radix == 2 || Radix == 8 || Radix == 10 || Radix == 16, not_supported_radix, ()); template <typename Context, typename Iterator> struct attribute { typedef T type; }; template <typename Iterator, typename Context , typename Skipper, typename Attribute> bool parse(Iterator& first, Iterator const& last , Context& /*context*/, Skipper const& skipper , Attribute& attr) const { qi::skip_over(first, last, skipper); return extract_int<T, Radix, MinDigits, MaxDigits> ::call(first, last, attr); } template <typename Context> info what(Context& /*context*/) const { return info("integer"); } };
The int_parser
is derived
from a
which in turn derives from PrimitiveParser
. Therefore, it supports the following
member function
member function
parse is the main entry point. For primitive parsers, our first thing to do is call:
qi::skip(first, last, skipper);
to do a pre-skip. After pre-skipping, the parser proceeds to do its thing.
The actual parsing code is placed in extract_int<T, Radix, MinDigits, MaxDigits>::call(first, last, attr);
This simple no-frills protocol is one of the reasons why Spirit is fast. If you know the internals of Spirit.Classic and perhaps even wrote some parsers with it, this simple Spirit mechanism is a joy to work with. There are no scanners and all that crap.
The what function just tells us that it is an integer parser. Simple.
The attribute metafunction returns the T template
parameter. We associate the int_parser
to some placeholders for short_
, long_
and long_long
types. But,
first, we enable these placeholders in namespace boost::spirit:
template <> struct use_terminal<qi::domain, tag::short_> // enables short_ : mpl::true_ {};
template <> struct use_terminal<qi::domain, tag::int_> // enables int_ : mpl::true_ {};
template <> struct use_terminal<qi::domain, tag::long_> // enables long_ : mpl::true_ {};
template <> struct use_terminal<qi::domain, tag::long_long> // enables long_long : mpl::true_ {};
Notice that int_parser
is placed in the namespace boost::spirit::qi while these enablers
are in namespace boost::spirit. The reason is that these placeholders are
shared by other Spirit
domains. Spirit.Qi, the parser
is one domain. Spirit.Karma, the generator is another
domain. Other parser technologies may be developed and placed in yet another
domain. Yet, all these can potentially share the same placeholders for
interoperability. The interpretation of these placeholders is domain-specific.
Now that we enabled the placeholders, we have to write generators for them. The make_xxx stuff (in boost::spirit::qi namespace):
template <typename T> struct make_int { typedef int_parser_impl<T> result_type; result_type operator()(unused_type, unused_type) const { return result_type(); } };
This one above is our main generator. It's a simple function object with 2 (unused) arguments. These arguments are
to pass information to inner parser nodes. We'll see how that works later.
template <typename Modifiers> struct make_primitive<tag::short_, Modifiers> : make_int<short> {};
template <typename Modifiers> struct make_primitive<tag::int_, Modifiers> : make_int<int> {};
template <typename Modifiers> struct make_primitive<tag::long_, Modifiers> : make_int<long> {};
template <typename Modifiers> struct make_primitive<tag::long_long, Modifiers> : make_int<boost::long_long_type> {};
These, specialize qi:make_primitive
for specific tags. They
all inherit from make_int
which does the actual work.
Let me present the kleene star (also in namespace spirit::qi):
template <typename Subject> struct kleene : unary_parser<kleene<Subject> > { typedef Subject subject_type; template <typename Context, typename Iterator> struct attribute { // Build a std::vector from the subject's attribute. Note // that build_std_vector may return unused_type if the // subject's attribute is an unused_type. typedef typename traits::build_std_vector< typename traits:: attribute_of<Subject, Context, Iterator>::type >::type type; }; kleene(Subject const& subject) : subject(subject) {} template <typename Iterator, typename Context , typename Skipper, typename Attribute> bool parse(Iterator& first, Iterator const& last , Context& context, Skipper const& skipper , Attribute& attr) const { // create a local value if Attribute is not unused_type typedef typename traits::container_value<Attribute>::type value_type; value_type val = value_type(); // Repeat while subject parses ok while (subject.parse(first, last, context, skipper, val)) { // push the parsed value into our attribute traits::push_back(attr, val); traits::clear(val); } return true; } template <typename Context> info what(Context& context) const { return info("kleene", subject.what(context)); } Subject subject; };
Looks similar in form to its primitive cousin, the int_parser
And, again, it has the same basic ingredients required by Derived
kleene is a composite parser. It is a parser that composes another parser,
its “subject”. It is a
and subclasses
from it. Like UnaryParser
, PrimitiveParser
derives from UnaryParser
unary_parser<Derived>, has these expression requirements on Derived:
parse is the main parser entry point. Since this is
not a primitive parser, we do not need to call qi::skip(first, last, skipper)
. The subject, if
it is a primitive, will do the pre-skip. If if it is another composite
parser, it will eventually call a primitive parser somewhere down the line
which will do the pre-skip. This makes it a lot more efficient than Spirit.Classic.
puts the skipping business into the so-called "scanner" which
blindly attempts a pre-skip everytime we increment the iterator.
What is the attribute of the kleene? In general, it
is a std::vector<T>
where T
is the attribute
of the subject. There is a special case though. If T
is an unused_type
, then
the attribute of kleene is also unused_type
takes care of that minor
So, let's parse. First, we need to provide a local attribute of for the subject:
typename traits::attribute_of<Subject, Context>::type val;
traits::attribute_of<Subject, Context>
simply calls the subject's struct
nested metafunction.
val starts out default initialized. This val is the one we'll pass to the subject's parse function.
The kleene repeats indefinitely while the subject parser is successful.
On each successful parse, we push_back
the parsed attribute to the kleen's attribute, which is expected to be,
at the very least, compatible with a std::vector
In other words, although we say that we want our attribute to be a std::vector
, we try to be more lenient than
that. The caller of kleene's parse may pass a different attribute type.
For as long as it is also a conforming STL container with push_back
, we are ok. Here is the kleene
while (subject.parse(first, last, context, skipper, val)) { // push the parsed value into our attribute traits::push_back(attr, val); traits::clear(val); } return true;
Take note that we didn't call attr.push_back(val). Instead, we called a Spirit provided function:
traits::push_back(attr, val);
This is a recurring pattern. The reason why we do it this way is because
attr can be unused_type
takes care of that detail.
The overload for unused_type is a no-op. Now, you can imagine why Spirit is fast! The parsers
are so simple and the generated code is as efficient as a hand rolled loop.
All these parser compositions and recursive parse invocations are extensively
inlined by a modern C++ compiler. In the end, you get a tight loop when
you use the kleene. No more excess baggage. If the attribute is unused,
then there is no code generated for that. That's how Spirit
is designed.
The what function simply wraps the output of the subject in a "kleene“... "”".
Ok, now, like the int_parser
we have to hook our parser to the qi
engine. Here's how we do it:
First, we enable the prefix star operator. In proto, it's called the "dereference":
template <> struct use_operator<qi::domain, proto::tag::dereference> // enables *p : mpl::true_ {};
This is done in namespace boost::spirit
like its friend, the use_terminal
specialization for our int_parser
Obviously, we use use_operator to enable the dereference
for the qi::domain.
Then, we need to write our generator (in namespace qi):
template <typename Elements, typename Modifiers> struct make_composite<proto::tag::dereference, Elements, Modifiers> : make_unary_composite<Elements, kleene> {};
This essentially says; for all expressions of the form: *p
, to build a kleene parser. Elements
is a Boost.Fusion
sequence. For the kleene, which is a unary operator, expect only one element
in the sequence. That element is the subject of the kleene.
We still don't care about the Modifiers. We'll see how the modifiers is all about when we get to deep directives.