Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for an old version of Boost. Click here to view this page for the latest version.
PrevUpHomeNext

Employee - Parsing into structs

It's a common question in the Spirit General List: How do I parse and place the results into a C++ struct? Of course, at this point, you already know various ways to do it, using semantic actions. There are many ways to skin a cat. Spirit2, being fully attributed, makes it even easier. The next example demonstrates some features of Spirit2 that make this easy. In the process, you'll learn about:

First, let's create a struct representing an employee:

struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

Then, we need to tell Boost.Fusion about our employee struct to make it a first-class fusion citizen that the grammar can utilize. If you don't know fusion yet, it is a Boost library for working with heterogenous collections of data, commonly referred to as tuples. Spirit uses fusion extensively as part of its infrastructure.

In fusion's view, a struct is just a form of a tuple. You can adapt any struct to be a fully conforming fusion tuple:

BOOST_FUSION_ADAPT_STRUCT(
    client::employee,
    (int, age)
    (std::string, surname)
    (std::string, forename)
    (double, salary)
)

Now we'll write a parser for our employee. Inputs will be of the form:

employee{ age, "surname", "forename", salary }

Here goes:

template <typename Iterator>
struct employee_parser : qi::grammar<Iterator, employee(), ascii::space_type>
{
    employee_parser() : employee_parser::base_type(start)
    {
        using qi::int_;
        using qi::lit;
        using qi::double_;
        using qi::lexeme;
        using ascii::char_;

        quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];

        start %=
            lit("employee")
            >> '{'
            >>  int_ >> ','
            >>  quoted_string >> ','
            >>  quoted_string >> ','
            >>  double_
            >>  '}'
            ;
    }

    qi::rule<Iterator, std::string(), ascii::space_type> quoted_string;
    qi::rule<Iterator, employee(), ascii::space_type> start;
};

The full cpp file for this example can be found here: ../../example/qi/employee.cpp

Let's walk through this one step at a time (not necessarily from top to bottom).

template <typename Iterator>
struct employee_parser : grammar<Iterator, employee(), space_type>

employee_parser is a grammar. Like before, we make it a template so that we can reuse it for different iterator types. The grammar's signature is:

employee()

meaning, the parser generates employee structs. employee_parser skips white spaces using space_type as its skip parser.

employee_parser() : employee_parser::base_type(start)

Initializes the base class.

rule<Iterator, std::string(), space_type> quoted_string;
rule<Iterator, employee(), space_type> start;

Declares two rules: quoted_string and start. start has the same template parameters as the grammar itself. quoted_string has a std::string attribute.

Lexeme
lexeme['"' >> +(char_ - '"') >> '"'];

lexeme inhibits space skipping from the open brace to the closing brace. The expression parses quoted strings.

+(char_ - '"')

parses one or more chars, except the double quote. It stops when it sees a double quote.

Difference

The expression:

a - b

parses a but not b. Its attribute is just A; the attribute of a. b's attribute is ignored. Hence, the attribute of:

char_ - '"'

is just char.

Plus
+a

is similar to Kleene star. Rather than match everything, +a matches one or more. Like it's related function, the Kleene star, its attribute is a std::vector<A> where A is the attribute of a. So, putting all these together, the attribute of

+(char_ - '"')

is then:

std::vector<char>
Sequence Attribute

Now what's the attribute of

'"' >> +(char_ - '"') >> '"'

?

Well, typically, the attribute of:

a >> b >> c

is:

fusion::vector<A, B, C>

where A is the attribute of a, B is the attribute of b and C is the attribute of c. What is fusion::vector? - a tuple.

[Note] Note

If you don't know what I am talking about, see: Fusion Vector. It might be a good idea to have a look into Boost.Fusion at this point. You'll definitely see more of it in the coming pages.

Attribute Collapsing

Some parsers, especially those very little literal parsers you see, like '"', do not have attributes.

Nodes without attributes are disregarded. In a sequence, like above, all nodes with no attributes are filtered out of the fusion::vector. So, since '"' has no attribute, and +(char_ - '"') has a std::vector<char> attribute, the whole expression's attribute should have been:

fusion::vector<std::vector<char> >

But wait, there's one more collapsing rule: If the attribute is followed by a single element fusion::vector, The element is stripped naked from its container. To make a long story short, the attribute of the expression:

'"' >> +(char_ - '"') >> '"'

is:

std::vector<char>
Auto Rules

It is typical to see rules like:

r = p[_val = _1];

If you have a rule definition such as the above, where the attribute of the RHS (right hand side) of the rule is compatibe with the attribute of the LHS (left hand side), then you can rewrite it as:

r %= p;

The attribute of p automatically uses the attribute of r.

So, going back to our quoted_string rule:

quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];

is a simplified version of:

quoted_string = lexeme['"' >> +(char_ - '"') >> '"'][_val = _1];

The attribute of the quoted_string rule: std::string is compatible with the attribute of the RHS: std::vector<char>. The RHS extracts the parsed attribute directly into the rule's attribute, in-situ.

[Note] Note

r %= p and r = p are equivalent if there are no semantic actions associated with p.

Finally

We're down to one rule, the start rule:

start %=
    lit("employee")
    >> '{'
    >>  int_ >> ','
    >>  quoted_string >> ','
    >>  quoted_string >> ','
    >>  double_
    >>  '}'
    ;

Applying our collapsing rules above, the RHS has an attribute of:

fusion::vector<int, std::string, std::string, double>

These nodes do not have an attribute:

[Note] Note

In case you are wondering, lit("employee") is the same as "employee". We had to wrap it inside lit because immediately after it is >> '{'. You can't right-shift a char[] and a char - you know, C++ syntax rules.

Recall that the attribute of start is the employee struct:

struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

Now everything is clear, right? The struct employee IS compatible with fusion::vector<int, std::string, std::string, double>. So, the RHS of start uses start's attribute (a struct employee) in-situ when it does its work.


PrevUpHomeNext