In-depth: The Parser

What makes Spirit tick? Now on to some details... The parser class is the most fundamental entity in the framework. A parser accepts a scanner comprised of a first-last iterator pair and returns a match object as its result. The iterators delimit the data currently being parsed. The match object evaluates to true if the parse succeeds, in which case the input is advanced accordingly. Each parser can represent a specific pattern or algorithm, or it can be a more complex parser formed as a composition of other parsers.

All parsers inherit from the base template class, parser:

template <typename DerivedT>
struct parser
{
    /*...*/

    DerivedT& derived();
    DerivedT const& derived() const;
};

This class is a protocol base class for all parsers. The parser class does not really know how to parse anything but instead relies on the template parameter DerivedT to do the actual parsing. This technique is known as the "Curiously Recurring Template Pattern" in template meta-programming circles. This inheritance strategy gives us the power of polymorphism without the virtual function overhead. In essence this is a way to implement compile time polymorphism.

parser_category_t

Each derived parser has a typedef parser_category_t that defines its category. By default, if one is not specified, it will inherit from the base parser class which typedefs its parser_category_t as plain_parser_category. Some template classes are provided to distinguish different types of parsers. The following categories are the most generic. More specific types may inherit from these.

Parser categories
plain_parser_category Your plain vanilla parser
binary_parser_category A parser that has subject a and b (e.g. alternative)
unary_parser_category A parser that has single subject (e.g. kleene star)
action_parser_category A parser with an attached semantic action
    struct plain_parser_category {};
    struct binary_parser_category       : plain_parser_category {};
    struct unary_parser_category        : plain_parser_category {};
    struct action_parser_category       : unary_parser_category {};

embed_t

Each parser has a typedef embed_t. This typedef specifies how a parser is embedded in a composite. By default, if one is not specified, the parser will be embedded by value. That is, a copy of the parser is placed as a member variable of the composite. Most parsers are embedded by value. In certain situations however, this is not desirable or possible. One particular example is the rule. The rule, unlike other parsers is embedded by reference.

The match

The match holds the result of a parser. A match object evaluates to true when a successful match is found, otherwise false. The length of the match is the number of characters (or tokens) that is successfully matched. This can be queried through its length() member function. A negative value means that the match is unsuccessful.

Each parser may have an associated attribute. This attribute is also returned back to the client on a successful parse through the match object. We can get this attribute via the match's value() member function. Be warned though that the match's attribute may be invalid, in which case, getting the attribute will result in an exception. The member function has_valid_attribute() can be queried to know if it is safe to get the match's attribute. The attribute may be set anytime through the member function value(v)where v is the new attribute value.

A match attribute is valid:

The match attribute is undefined:

The match class:

    template <typename T>
    class match
    {
    public:

        /*...*/

        typedef T attr_t;
operator safe_bool() const; // convertible to a bool int length() const; bool has_valid_attribute() const; void value(T const&) const; T const& value(); };

match_result

It has been mentioned repeatedly that the parser returns a match object as its result. This is a simplification. Actually, for the sake of genericity, parsers are really not hard-coded to return a match object. More accurately, a parser returns an object that adheres to a conceptual interface, of which the match is an example. Nevertheless, we shall call the result type of a parser a match object regardless if it is actually a match class, a derivative or a totally unrelated type.

Meta-functions

What are meta-functions? We all know how functions look like. In simplest terms, a function accepts some arguments and returns a result. Here is the function we all love so much:

int identity_func(int arg)
{ return arg; } // return the argument arg

Meta-functions are essentially the same. These beasts also accept arguments and return a result. However, while functions work at runtime on values, meta-functions work at compile time on types (or constants, but we shall deal only with types). The meta-function is a template class (or struct). The template parameters are the arguments to the meta-function and a typedef within the class is the meta-function's return type. Here is the corresponding meta-function:

template <typename ArgT>
struct identity_meta_func
{ typedef ArgT type; } // return the argument ArgT

The meta-function above is invoked as:

typename identity_meta_func<ArgT>::type

By convention, meta-functions return the result through the typedef type. Take note that typename is only required within templates.

The actual match type used by the parser depends on two types: the parser's attribute type and the scanner type. match_result is the meta-function that returns the desired match type given an attribute type and a scanner type.

Usage:

    typename match_result<ScannerT, T>::type

The meta-function basically answers the question "given a scanner type ScannerT and an attribute type T, what is the desired match type?" [ typename is only required within templates ].

The parse member function

Concrete sub-classes inheriting from parser must have a corresponding member function parse(...) compatible with the conceptual Interface:

    template <typename ScannerT>
    RT
    parse(ScannerT const& scan) const;

where RT is the desired return type of the parser.

The parser result

Concrete sub-classes inheriting from parser in most cases need to have a nested meta-function result that returns the result type of the parser's parse member function, given a scanner type. The meta-function has the form:

    template <typename ScannerT>
    struct result
    {
        typedef RT type;
    };

where RT is the desired return type of the parser. This is usually, but not always, dependent on the template parameter ScannerT. For example, given an attribute type int, we can use the match_result metafunction:

    template <typename ScannerT>
    struct result
    {
        typedef typename match_result<ScannerT, int>::type type;
    };

If a parser does not supply a result metafunction, a default is provided by the base parser class. The default is declared as:

    template <typename ScannerT>
    struct result
    {
        typedef typename match_result<ScannerT, nil_t>::type type;
    };

Without a result metafunction, notice that the parser's default attribute is nil_t (i.e. the parser has no attribute).

parser_result

Given a a scanner type ScannerT and a parser type ParserT, what will be the actual result of the parser? The answer to this question is provided to by the parser_result meta-function.

Usage:

    typename parser_result<ParserT, ScannerT>::type

In general, the meta-function just forwards the invocation to the parser's result meta-function:

    template <typename ParserT, typename ScannerT>
    struct parser_result
    {
        typedef typename ParserT::template result<ScannerT>::type type;
    };

This is similar to a global function calling a member function. Most of the time, the usage above is equivalent to:

    typename ParserT::template result<ScannerT>::type

Yet, this should not be relied upon to be true all the time because the parser_result metafunction might be specialized for specific parser and/or scanner types.

The parser_result metafunction makes the signature of the required parse member function almost canonical:

    template <typename ScannerT>
    typename parser_result<self_t, ScannerT>::type
parse(ScannerT const& scan) const;

where self_t is a typedef to the parser.

parser class declaration

    template <typename DerivedT>
    struct parser
    {
        typedef DerivedT                embed_t;
        typedef DerivedT                derived_t;
        typedef plain_parser_category   parser_category_t;

        template <typename ScannerT>
        struct result
        {
            typedef typename match_result<ScannerT, nil_t>::type type;
        };

        DerivedT& derived();
        DerivedT const& derived() const;

        template <typename ActionT>
        action<DerivedT, ActionT>
        operator[](ActionT const& actor) const;
    };