Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

Qi Distinct Parser Directive

Description

The Spirit.Qi distinct parser is a directive component allowing to avoid partial matches while parsing using a skipper. A simple example is the common task of matching a C keyword. Consider:

"description" >> -lit(":") >> *(char_ - eol)

intended to match a line in a configuration file. Let's assume further, that this rule is used with a space skipper and that we have the following strings in the input:

"description: ident\n"
"description ident\n"
"descriptionident\n"

It might seem unexpected, but the parser above matches all three inputs just fine, even if the third input should not match at all! In order to avoid the unwanted match we are forced to make our rule more complicated:

lexeme["description" >> !char_("a-zA-Z_0-9")] >> -lit(":") >> *(char_ - eol)

(the rule reads as: match "description" as long as it's not directly followed by a valid identifier).

The distinct[] directive is meant to simplify the rule above:

distinct(char_("a-zA-Z_0-9"))["description"] >> -lit(":") >> *(char_ - eol)

Using the distinct[] component instead of the explicit sequence has the advantage of being able to encapsulate the tail (i.e the char_("a-zA-Z_0-9")) as a separate parser construct. The following code snippet illustrates the idea (for the full code of this example please see distinct.cpp):

namespace spirit = boost::spirit;
namespace ascii = boost::spirit::ascii;
namespace repo = boost::spirit::repository;

// Define metafunctions allowing to compute the type of the distinct()
// and ascii::char_() constructs
namespace traits
{
    // Metafunction allowing to get the type of any repository::distinct(...) 
    // construct
    template <typename Tail>
    struct distinct_spec
      : spirit::result_of::terminal<repo::tag::distinct(Tail)>
    {};

    // Metafunction allowing to get the type of any ascii::char_(...) construct
    template <typename String>
    struct char_spec
      : spirit::result_of::terminal<spirit::tag::ascii::char_(String)>
    {};
}

// Define a helper function allowing to create a distinct() construct from 
// an arbitrary tail parser
template <typename Tail>
inline typename traits::distinct_spec<Tail>::type
distinct_spec(Tail const& tail)
{
    return repo::distinct(tail);
}

// Define a helper function allowing to create a ascii::char_() construct 
// from an arbitrary string representation
template <typename String>
inline typename traits::char_spec<String>::type
char_spec(String const& str)
{
    return ascii::char_(str);
}

// the following constructs the type of a distinct_spec holding a
// charset("0-9a-zA-Z_") as its tail parser
typedef traits::char_spec<std::string>::type charset_tag_type;
typedef traits::distinct_spec<charset_tag_type>::type keyword_tag_type;

// Define a new Qi 'keyword' directive usable as a shortcut for a
// repository::distinct(char_(std::string("0-9a-zA-Z_")))
std::string const keyword_spec("0-9a-zA-Z_");
keyword_tag_type const keyword = distinct_spec(char_spec(keyword_spec));

These definitions define a new Qi parser recognizing keywords! This allows to rewrite our declaration parser expression as:

keyword["description"] >> -lit(":") >> *(char_ - eol)

which is much more readable and concise if compared to the original parser expression. In addition the new keyword[] directive has the advantage to be usable for wrapping any parser expression, not only strings as in the example above.

Header
// forwards to <boost/spirit/repository/home/qi/directive/distinct.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
Synopsis
distinct(tail)[subject]
Parameters

Parameter

Description

tail

The parser construct specifying what whould not follow the subject in order to match the overall expression.

subject

The parser construct to use to match the current input. The distinct directive makes sure that no unexpected partial matches occur.

All two parameters can be arbitrary complex parsers themselves.

Attribute

The distinct component exposes the attribute type of its subject as its own attribute type. If the subject does not expose any attribute (the type is unused_type), then the distinct does not expose any attribute either.

a: A, b: B --> distinct(b)[a]: A
Example

The following example shows simple use cases of the distinct parser. distinct.cpp)

Prerequisites

In addition to the main header file needed to include the core components implemented in Spirit.Qi we add the header file needed for the new distinct generator.

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>

To make all the code below more readable we introduce the following namespaces.

using namespace boost::spirit;
using namespace boost::spirit::ascii;
using boost::spirit::repository::distinct;

Using The Distinct Directive to Match keywords

We show several examples of how the distinct[] directive can be used to force correct behavior while matching keywords. The first two code snippets show the correct matching of the description keyword (in this hypothetical example we allow keywords to be directly followed by an optional "--"):

{
    std::string str("description ident");
    std::string::iterator first(str.begin());
    bool r = qi::phrase_parse(first, str.end()
      , distinct(alnum | '_')["description"] >> -lit("--") >> +(alnum | '_')
      , space);
    BOOST_ASSERT(r && first == str.end());
}

{
    std::string str("description--ident");
    std::string::iterator first(str.begin());
    bool r = qi::phrase_parse(first, str.end()
      , distinct(alnum | '_')["description"] >> -lit("--") >> +(alnum | '_')
      , space);
    BOOST_ASSERT(r && first == str.end());
}

The last example shows that the distinct[] parser component correctly refuses to match "description-ident":

{
    std::string str("description-ident");
    std::string::iterator first(str.begin());
    bool r = qi::phrase_parse(first, str.end()
      , distinct(alnum | '_')["description"] >> -lit("--") >> +(alnum | '_')
      , space);
    BOOST_ASSERT(!r && first == str.begin());
}


PrevUpHomeNext