...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
The rules shown so far have defined terminal symbols, representing indivisible units of grammar. To parse more complex things, a parser combinator (or compound rule) is a rule which accepts as parameters one or more rules and combines them to form a higher order algorithm. In this section we introduce the compound rules provided by the library, and how they may be used to express more complex grammars.
Consider the following grammar:
version = "v" dec-octet "." dec-octet
We can express this using tuple_rule
, which matches one or
more specified rules in sequence. The folllowing defines a sequence using
some character literals and two decimal octets, which is a fancy way of saying
a number between 0 and 255:
constexpr auto version_rule = tuple_rule( delim_rule( 'v' ), dec_octet_rule, delim_rule( '.' ), dec_octet_rule );
This rule has a value type of std::tuple
,
whose types correspond to the value type of each rule specified upon construction.
The decimal octets are represented by the dec_octet_rule
which stores its
result in an unsigned char
:
result< std::tuple< string_view, unsigned char, string_view, unsigned char > > rv = parse( "v42.44800", version_rule );
To extract elements from std::tuple
the function std::get
must be used. In this case, we don't care to know the value for the matching
character literals. The tuple_rule
discards match results
whose value type is void
. We
can use the squelch
compound rule to convert
a matching value type to void
,
and reformulate our rule:
constexpr auto version_rule = tuple_rule( squelch( delim_rule( 'v' ) ), dec_octet_rule, squelch( delim_rule( '.' ) ), dec_octet_rule ); result< std::tuple< unsigned char, unsigned char > > rv = parse( "v42.44800", version_rule );
When all but one of the value types is void
,
the std::tuple
is elided and the remaining value
type is promoted to the result of the match:
// port = ":" unsigned-short constexpr auto port_rule = tuple_rule( squelch( delim_rule( ':' ) ), unsigned_rule< unsigned short >{} ); result< unsigned short > rv = parse( ":443", port_rule );
BNF elements in brackets denote optional components. These are expressed
using optional_rule
, whose value type
is an optional
.
For example, we can adapt the port rule from above to be an optional component:
// port = [ ":" unsigned-short ] constexpr auto port_rule = optional_rule( tuple_rule( squelch( delim_rule( ':' ) ), unsigned_rule< unsigned short >{} ) ); result< optional< unsigned short > > rv = parse( ":8080", port_rule ); assert( rv->has_value() && rv->value() == 8080 );
In this example we build up a rule to represent an endpoint as an IPv4 address with an optional port:
// ipv4_address = dec-octet "." dec-octet "." dec-octet "." dec-octet // // port = ":" unsigned-short // // endpoint = ipv4_address [ port ] constexpr auto endpoint_rule = tuple_rule( tuple_rule( dec_octet_rule, squelch( delim_rule( '.' ) ), dec_octet_rule, squelch( delim_rule( '.' ) ), dec_octet_rule, squelch( delim_rule( '.' ) ), dec_octet_rule ), optional_rule( tuple_rule( squelch( delim_rule( ':' ) ), unsigned_rule< unsigned short >{} ) ) );
This can be simplified; the library provides ipv4_address_rule
whose result type
is ipv4_address
,
offering more utility than representing the address simply as a collection
of four numbers:
constexpr auto endpoint_rule = tuple_rule( ipv4_address_rule, optional_rule( tuple_rule( squelch( delim_rule( ':' ) ), unsigned_rule< unsigned short >{} ) ) ); result< std::tuple< ipv4_address, optional< unsigned short > > > rv = parse( "192.168.0.1:443", endpoint_rule );
BNF elements separated by unquoted slashes represent a set of alternatives
from which one element may match. We represent them using variant_rule
, whose value type is
a variant
.
Consider the following HTTP production rule which comes from rfc7230:
request-target = origin-form / absolute-form / authority-form / asterisk-form
The request-target can be exactly one of these things.
Here we define the rule, using origin_form_rule
, absolute_uri_rule
, and authority_rule
which come with the
library, and obtain a result from parsing a string:
constexpr auto request_target_rule = variant_rule( origin_form_rule, absolute_uri_rule, authority_rule, delim_rule('*') ); result< variant< url_view, url_view, authority_view, string_view > > rv = parse( "/results.htm?page=4", request_target_rule );
In the next section we discuss facilities to parse a repeating number of elements.