...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Thus far the rules we have examined have one thing in common; the values they produce are fixed in size and known at compile-time. However, grammars can specify the repetition of elements. For example consider the following grammar (loosely adapted from rfc7230):
chunk-ext = *( ";" token )
The star operator in BNF notation means a repetition. In this case, zero
or more of the expression in parenthesis. This production can be expressed
using the function range_rule
, which returns a rule
allowing for a prescribed number of repetitions of a specified rule. The
following rule matches the grammar for chunk-ext defined
above:
constexpr auto chunk_ext_rule = range_rule( tuple_rule( squelch( delim_rule( ';' ) ), token_rule( alnum_chars ) ) );
This rule produces a range
, a ForwardRange
whose value type is the same as the value type of the rule passed to the
function. In this case, the type is string_view
because the tuple has
one unsquelched element, the token_rule
. The range can be iterated
to produce results, without allocating memory for each element. The following
code:
system::result< range< core::string_view > > rv = parse( ";johndoe;janedoe;end", chunk_ext_rule ); for( auto s : rv.value() ) std::cout << s << "\n";
produces this output:
johndoe janedoe end
Sometimes a repetition is not so easily expressed using a single rule. Take for example the following grammar for a comma delimited list of tokens, which must contain at least one element:
token-list = token *( "," token )
We can express this using the overload of range_rule
which accepts two parameters:
the rule to use when performing the first match, and the rule to use for
performing every subsequent match. Both overloads of the function have additional,
optional parameters for specifying the minimum number of repetitions, or
both the minimum and maximum number of repetitions. Since our list may not
be empty, the following rule perfectly captures the token-list
grammar:
constexpr auto token_list_rule = range_rule( token_rule( alnum_chars ), tuple_rule( squelch( delim_rule( ',' ) ), token_rule( alnum_chars ) ), 1 );
The following code:
system::result< range< core::string_view > > rv = parse( "johndoe,janedoe,end", token_list_rule ); for( auto s : rv.value() ) std::cout << s << "\n";
produces this output:
johndoe janedoe end
In the next section we discuss the available rules which are specific to rfc3986.
These are the rules and compound rules provided by the library. For more details please see the corresponding reference sections.
Table 1.34. Grammar Symbols
Name |
Description |
---|---|
Match an integer from 0 and 255. |
|
Match a character literal. |
|
Match a character string exactly. |
|
Make a matching empty string into an error instead. |
|
Ignore a rule if parsing fails, leaving the input pointer unchanged. |
|
Match a repeating number of elements. |
|
Match a string of characters from a character set. |
|
Match a sequence of specified rules, in order. |
|
Match an unsigned integer in decimal form. |
|
Match one of a set of alternatives specified by rules. |