...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
The char_
parser matches
single characters. The char_
parser has an associated Character
Encoding Namespace. This is needed when doing basic operations
such as inhibiting case sensitivity and dealing with character ranges.
There are various forms of char_
.
The no argument form of char_
matches any character in the associated Character
Encoding Namespace.
char_ // matches any character
The single argument form of char_
(with a character argument) matches the supplied character.
char_('x') // matches 'x' char_(L'x') // matches L'x' char_(x) // matches x (a char)
char_
with two arguments,
matches a range of characters.
char_('a','z') // alphabetic characters char_(L'0',L'9') // digits
A range of characters is created from a low-high character pair. Such a parser matches a single character that is in the range, including both endpoints. Note, the first character must be before the second, according to the underlying Character Encoding Namespace.
Character mapping is inherently platform dependent. It is not guaranteed
in the standard for example that 'A'
< 'Z'
,
that is why in Spirit2, we purposely attach a specific Character
Encoding Namespace (such as ASCII, ISO-8859-1) to the char_
parser to eliminate such ambiguities.
Note | |
---|---|
Sparse bit vectors
To accommodate 16/32 and 64 bit characters, the char-set statically
switches from a
|
Lastly, when given a string (a plain C string, a std::basic_string
,
etc.), the string is regarded as a char-set definition string following
a syntax that resembles posix style regular expression character sets
(except that double quotes delimit the set elements instead of square
brackets and there is no special negation ^ character). Examples:
char_("a-zA-Z") // alphabetic characters char_("0-9a-fA-F") // hexadecimal characters char_("actgACTG") // DNA identifiers char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
lit
, when passed a single
character, behaves like the single argument char_
except that lit
does
not synthesize an attribute. A plain char
or wchar_t
is equivalent
to a lit
.
Note | |
---|---|
|
Examples:
'x' lit('x') lit(L'x') lit(c) // c is a char
// forwards to <boost/spirit/home/qi/char/char.hpp> #include <boost/spirit/include/qi_char_.hpp>
Also, see Include Structure.
Name |
---|
|
|
In the table above, ns
represents a Character
Encoding Namespace.
Notation
c
, f
, l
A literal char, e.g. 'x'
,
L'x'
or anything that can be converted to a char
or wchar_t
, or a
Lazy Argument
that evaluates to anything that can be converted to a char
or wchar_t
.
ns
cs
A String
or a Lazy
Argument that evaluates to a String
that specifies a char-set definition string following a syntax
that resembles posix style regular expression character sets (except
the square brackets and the negation ^
character).
cp
A char parser, a char range parser or a char set parser.
Semantics of an expression is defined only where it differs from, or
is not defined in PrimitiveParser
.
Expression |
Semantics |
---|---|
|
Create char parser from a char, |
|
Create a char parser from a char, |
|
Create a char parser that matches any character in the |
|
Create a char parser with |
|
Create a char-range parser that matches characters from range
( |
|
Create a char-set parser with |
|
Negate |
Expression |
Attribute |
---|---|
|
|
|
|
|
The character type of the Character
Encoding Namespace, |
|
The character type of the Character
Encoding Namespace, |
|
The character type of the Character
Encoding Namespace, |
|
The character type of the Character
Encoding Namespace, |
|
The attribute of |
O(N), except for char-sets with 16-bit (or more) characters (e.g.
wchar_t
). These have O(log N) complexity, where N is the number of distinct character ranges in the set.
Note | |
---|---|
The test harness for the example(s) below is presented in the Basics Examples section. |
Some using declarations:
using boost::spirit::qi::lit; using boost::spirit::ascii::char_;
Basic literals:
test_parser("x", 'x'); // plain literal test_parser("x", lit('x')); // explicit literal test_parser("x", char_('x')); // ascii::char_
Range:
char ch; test_parser_attr("5", char_('0','9'), ch); // ascii::char_ range std::cout << ch << std::endl; // prints '5'
Character set:
test_parser_attr("5", char_("0-9"), ch); // ascii::char_ set std::cout << ch << std::endl; // prints '5'
Lazy char_ using Phoenix
namespace phx = boost::phoenix; test_parser("x", phx::val('x')); // direct test_parser("5", char_(phx::val('0'),phx::val('9'))); // ascii::char_ range