...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
It's a common question in the Spirit General List: How do I parse and place the results into a C++ struct? Of course, at this point, you already know various ways to do it, using semantic actions. There are many ways to skin a cat. Spirit2, being fully attributed, makes it even easier. The next example demonstrates some features of Spirit2 that make this easy. In the process, you'll learn about:
First, let's create a struct representing an employee:
struct employee { int age; std::string surname; std::string forename; double salary; };
Then, we need to tell Boost.Fusion about our employee struct to make it a first-class fusion citizen that the grammar can utilize. If you don't know fusion yet, it is a Boost library for working with heterogenous collections of data, commonly referred to as tuples. Spirit uses fusion extensively as part of its infrastructure.
In fusion's view, a struct is just a form of a tuple. You can adapt any struct to be a fully conforming fusion tuple:
BOOST_FUSION_ADAPT_STRUCT( client::employee, (int, age) (std::string, surname) (std::string, forename) (double, salary) )
Now we'll write a parser for our employee. Inputs will be of the form:
employee{ age, "surname", "forename", salary }
Here goes:
template <typename Iterator> struct employee_parser : qi::grammar<Iterator, employee(), ascii::space_type> { employee_parser() : employee_parser::base_type(start) { using qi::int_; using qi::lit; using qi::double_; using qi::lexeme; using ascii::char_; quoted_string %= lexeme['"' >> +(char_ - '"') >> '"']; start %= lit("employee") >> '{' >> int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> double_ >> '}' ; } qi::rule<Iterator, std::string(), ascii::space_type> quoted_string; qi::rule<Iterator, employee(), ascii::space_type> start; };
The full cpp file for this example can be found here: ../../example/qi/employee.cpp
Let's walk through this one step at a time (not necessarily from top to bottom).
template <typename Iterator> struct employee_parser : grammar<Iterator, employee(), space_type>
employee_parser
is a grammar.
Like before, we make it a template so that we can reuse it for different
iterator types. The grammar's signature is:
employee()
meaning, the parser generates employee structs. employee_parser
skips white spaces using space_type
as its skip parser.
employee_parser() : employee_parser::base_type(start)
Initializes the base class.
rule<Iterator, std::string(), space_type> quoted_string; rule<Iterator, employee(), space_type> start;
Declares two rules: quoted_string
and start
. start
has the same template parameters
as the grammar itself. quoted_string
has a std::string
attribute.
lexeme['"' >> +(char_ - '"') >> '"'];
lexeme
inhibits space skipping
from the open brace to the closing brace. The expression parses quoted
strings.
+(char_ - '"')
parses one or more chars, except the double quote. It stops when it sees a double quote.
The expression:
a - b
parses a
but not b
. Its attribute is just A
; the attribute of a
.
b
's attribute is ignored.
Hence, the attribute of:
char_ - '"'
is just char
.
+a
is similar to Kleene star. Rather than match everything, +a
matches
one or more. Like it's related function, the Kleene star, its attribute
is a std::vector<A>
where A
is the attribute
of a
. So, putting all these
together, the attribute of
+(char_ - '"')
is then:
std::vector<char>
Now what's the attribute of
'"' >> +(char_ - '"') >> '"'
?
Well, typically, the attribute of:
a >> b >> c
is:
fusion::vector<A, B, C>
where A
is the attribute
of a
, B
is the attribute of b
and
C
is the attribute of
c
. What is fusion::vector
? - a tuple.
Note | |
---|---|
If you don't know what I am talking about, see: Fusion Vector. It might be a good idea to have a look into Boost.Fusion at this point. You'll definitely see more of it in the coming pages. |
Some parsers, especially those very little literal parsers you see, like
'"'
, do not have attributes.
Nodes without attributes are disregarded. In a sequence, like above, all
nodes with no attributes are filtered out of the fusion::vector
.
So, since '"'
has no attribute,
and +(char_
- '"')
has a std::vector<char>
attribute, the whole expression's attribute
should have been:
fusion::vector<std::vector<char> >
But wait, there's one more collapsing rule: If the attribute is followed
by a single element fusion::vector
,
The element is stripped naked from its container. To make a long story
short, the attribute of the expression:
'"' >> +(char_ - '"') >> '"'
is:
std::vector<char>
It is typical to see rules like:
r = p[_val = _1];
If you have a rule definition such as the above, where the attribute of the RHS (right hand side) of the rule is compatibe with the attribute of the LHS (left hand side), then you can rewrite it as:
r %= p;
The attribute of p
automatically
uses the attribute of r
.
So, going back to our quoted_string
rule:
quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];
is a simplified version of:
quoted_string = lexeme['"' >> +(char_ - '"') >> '"'][_val = _1];
The attribute of the quoted_string
rule: std::string
is compatible
with the attribute of the RHS: std::vector<char>
. The RHS extracts the parsed attribute
directly into the rule's attribute, in-situ.
Note | |
---|---|
|
We're down to one rule, the start rule:
start %= lit("employee") >> '{' >> int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> double_ >> '}' ;
Applying our collapsing rules above, the RHS has an attribute of:
fusion::vector<int, std::string, std::string, double>
These nodes do not have an attribute:
lit("employee")
'{'
','
'}'
Note | |
---|---|
In case you are wondering, |
Recall that the attribute of start
is the employee
struct:
struct employee { int age; std::string surname; std::string forename; double salary; };
Now everything is clear, right? The struct
employee
IS
compatible with fusion::vector<int, std::string, std::string, double>
. So, the RHS of start
uses start's attribute (a struct
employee
) in-situ when it does
its work.