...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
A Rule is an
object which tries to match the beginning of an input character buffer against
a particular syntax. It returns a result
containing a value if the
match was successful, or an error_code
if the match failed.
Rules are not invoked directly. Instead they are passed as values to a parse
function, along with the input character buffer to process. The first overload
requires that the entire input string match, otherwise else an error occurs.
The second overload advances the input buffer pointer to the first unconsumed
character upon success, allowing a stream of data to be parsed sequentially:
template< class Rule > auto parse( string_view s, Rule const& r) -> result< typename Rule::value_type >; template< class Rule > auto parse( char const *& it, char const* end, Rule const& r) -> result< typename Rule::value_type >;
To satisfy the Rule
concept, a class
or struct
must declare the nested type value_type
indicating the type of value
returned upon success, and a const
member function parse
with
a prescribed signature. In the following code we define a rule that matches
a single comma:
struct comma_rule_t { // The type of value returned upon success using value_type = string_view; // The algorithm which checks for a match result< value_type > parse( char const*& it, char const* end ) const { if( it != end && *it == ',') return string_view( it++, 1 ); return error::mismatch; } };
Since rules are passed by value, we declare a constexpr
variable of the type for syntactical convenience. Variable names for rules
are usually suffixed with _rule
:
constexpr comma_rule_t comma_rule{};
Now we can call parse
with the string of input and
the rule variable thusly:
result< string_view > rv = parse( ",", comma_rule ); assert( rv.has_value() && rv.value() == "," );
Rule expressions can come in several styles. The rule defined above is a
compile-time constant. The unsigned_rule
matches an unsigned
decimal integer. Here we construct the rule at run time and specify the type
of unsigned integer used to hold the result with a template parameter:
result< unsigned short > rv = parse( "16384", unsigned_rule< unsigned short >{} );
The function delim_rule
returns a rule which
matches the passed character literal. This is a more general version of the
comma rule which we defined earlier. There is also an overload which matches
exactly one character from a character set.
result< string_view > rv = parse( ",", delim_rule(',') );
When a rule fails to match, or if the rule detects a unrecoverable problem
with the input, it returns a result assigned from an error_code
indicating the failure.
When using overloads of parse
which have a character pointer
as both an in and out parameter, it is up to the rule to define which character
is pointed to upon error. When the rule matches successfully, the pointer
is always changed to point to the first unconsumed character in the input,
or to the end
pointer if
all input was consumed.
It is the responsibilty of library and user-defined implementations of compound rules (explained later) to rewind their internal pointer if a parsing operation was unsuccessful, and they wish to attempt parsing the same input using a different rule. Users who extend the library's grammar by defining their own custom rules should follow the behaviors described above regarding the handling of errors and the modification of the caller's input pointer.