Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Click here to view the latest version of this page.
PrevUpHomeNext

Supported Regular Expressions

Table 11. Regular expressions support

Expression

Meaning

x

Match any character x

.

Match any except newline (or optionally any character)

"..."

All characters taken as literals between double quotes, except escape sequences

[xyz]

A character class; in this case matches x, y or z

[abj-oZ]

A character class with a range in it; matches a, b any letter from j through o or a Z

[^A-Z]

A negated character class i.e. any character but those in the class. In this case, any character except an uppercase letter

r*

Zero or more r's (greedy), where r is any regular expression

r*?

Zero or more r's (abstemious), where r is any regular expression

r+

One or more r's (greedy)

r+?

One or more r's (abstemious)

r?

Zero or one r's (greedy), i.e. optional

r??

Zero or one r's (abstemious), i.e. optional

r{2,5}

Anywhere between two and five r's (greedy)

r{2,5}?

Anywhere between two and five r's (abstemious)

r{2,}

Two or more r's (greedy)

r{2,}?

Two or more r's (abstemious)

r{4}

Exactly four r's

{NAME}

The macro NAME (see below)

"[xyz]\"foo"

The literal string [xyz]\"foo

\X

If X is a, b, e, n, r, f, t, v then the ANSI-C interpretation of \x. Otherwise a literal X (used to escape operators such as *)

\0

A NUL character (ASCII code 0)

\123

The character with octal value 123

\x2a

The character with hexadecimal value 2a

\cX

A named control character X.

\a

A shortcut for Alert (bell).

\b

A shortcut for Backspace

\e

A shortcut for ESC (escape character 0x1b)

\n

A shortcut for newline

\r

A shortcut for carriage return

\f

A shortcut for form feed 0x0c

\t

A shortcut for horizontal tab 0x09

\v

A shortcut for vertical tab 0x0b

\d

A shortcut for [0-9]

\D

A shortcut for [^0-9]

\s

A shortcut for [\x20\t\n\r\f\v]

\S

A shortcut for [^\x20\t\n\r\f\v]

\w

A shortcut for [a-zA-Z0-9_]

\W

A shortcut for [^a-zA-Z0-9_]

(r)

Match an r; parenthesis are used to override precedence (see below)

(?r-s:pattern)

apply option 'r' and omit option 's' while interpreting pattern. Options may be zero or more of the characters 'i' or 's'. 'i' means case-insensitive. '-i' means case-sensitive. 's' alters the meaning of the '.' syntax to match any single character whatsoever. '-s' alters the meaning of '.' to match any character except '\n'.

rs

The regular expression r followed by the regular expression s (a sequence)

r|s

Either an r or and s

^r

An r but only at the beginning of a line (i.e. when just starting to scan, or right after a newline has been scanned)

r$

An r but only at the end of a line (i.e. just before a newline)


[Note] Note

POSIX character classes are not currently supported, due to performance issues when creating them in wide character mode.

[Tip] Tip

If you want to build tokens for syntaxes that recognize items like quotes ("'", '"') and backslash (\), here is example syntax to get you started. The lesson here really is to remember that both c++, as well as regular expressions require escaping with \ for some constructs, which can cascade.

quote1         = "'";            // match single "'"
quote2         = "\\\"";         // match single '"'
literal_quote1 = "\\'";          // match backslash followed by single "'"
literal_quote2 = "\\\\\\\"";     // match backslash followed by single '"'
literal_backslash = "\\\\\\\\";  // match two backslashs

Regular Expression Precedence
Macros

Regular expressions can be given a name and referred to in rules using the syntax {NAME} where NAME is the name you have given to the macro. A macro name can be at most 30 characters long and must start with a _ or a letter. Subsequent characters can be _, -, a letter or a decimal digit.


PrevUpHomeNext