Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. — Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for an old version of boost. Click here for the latest Boost documentation.

Boost.Regex

syntax_option_type

Synopsis

Type syntax_option type is an implementation defined bitmask type that controls how a regular expression string is to be interpreted. For convenience note that all the constants listed here, are also duplicated within the scope of class template basic_regex.

namespace std{ namespace regex_constants{

typedef bitmask_type syntax_option_type;
// these flags are standardized:
static const syntax_option_type normal;
static const syntax_option_type icase;
static const syntax_option_type nosubs;
static const syntax_option_type optimize;
static const syntax_option_type collate;
static const syntax_option_type ECMAScript = normal;
static const syntax_option_type JavaScript = normal;
static const syntax_option_type JScript = normal;
static const syntax_option_type basic;
static const syntax_option_type extended;
static const syntax_option_type awk;
static const syntax_option_type grep;
static const syntax_option_type egrep;
static const syntax_option_type sed = basic;
static const syntax_option_type perl;
// these are boost.regex specific:
static const syntax_option_type escape_in_lists;
static const syntax_option_type char_classes;
static const syntax_option_type intervals;
static const syntax_option_type limited_ops;
static const syntax_option_type newline_alt;
static const syntax_option_type bk_plus_qm;
static const syntax_option_type bk_braces;
static const syntax_option_type bk_parens;
static const syntax_option_type bk_refs;
static const syntax_option_type bk_vbar;
static const syntax_option_type use_except;
static const syntax_option_type failbit;
static const syntax_option_type literal;
static const syntax_option_type nocollate;
static const syntax_option_type perlex;
static const syntax_option_type emacs;

} // namespace regex_constants
} // namespace std

Description

The type syntax_option_type is an implementation defined bitmask type (17.3.2.1.2). Setting its elements has the effects listed in the table below, a valid value of type syntax_option_type will always have exactly one of the elements normal, basic, extended, awk, grep, egrep, sed or perl set.

Note that for convenience all the constants listed here are duplicated within the scope of class template basic_regex, so you can use any of:

boost::regex_constants::constant_name

boost::regex::constant_name

boost::wregex::constant_name

in an interchangeable manner.

Element

Effect if set

normal

Specifies that the grammar recognized by the regular expression engine uses its normal semantics: that is the same as that given in the ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1).

boost.regex also recognizes most perl-compatible extensions in this mode.

icase

Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.

nosubs

Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure.

optimize

Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex.

collate

Specifies that character ranges of the form "[a-b]" should be locale sensitive.

ECMAScript

The same as normal.

JavaScript

The same as normal.

JScript

The same as normal.

basic

Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).

extended

Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX extended regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).

awk

Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk (FWD.1).

That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted.

grep

Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1).

That is to say, the same as POSIX basic syntax, but with the newline character acting as an alternation character in addition to "|".

egrep

Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep when given the -E option in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1).

That is to say, the same as POSIX extended syntax, but with the newline character acting as an alternation character in addition to "|".

sed

The same as basic.

perl

The same as normal.

The following constants are specific to this particular regular expression implementation and do not appear in the regular expression standardization proposal:

regbase::escape_in_lists Allows the use of the escape "\" character in sets of characters, for example [\]] represents the set of characters containing only "]". If this flag is not set then "\" is an ordinary character inside sets.

regbase::char_classes When this bit is set, character classes [:classname:] are allowed inside character set declarations, for example "[[:word:]]" represents the set of all characters that belong to the character class "word".

regbase:: intervals When this bit is set, repetition intervals are allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter a's.

regbase:: limited_ops When this bit is set all of "+", "?" and "|" are ordinary characters in all situations.

regbase:: newline_alt When this bit is set, then the newline character "\n" has the same effect as the alternation operator "|".

regbase:: bk_plus_qm When this bit is set then "\+" represents the one or more repetition operator and "\?" represents the zero or one repetition operator. When this bit is not set then "+" and "?" are used instead.

regbase:: bk_braces When this bit is set then "\{" and "\}" are used for bounded repetitions and "{" and "}" are normal characters. This is the opposite of default behavior.

regbase:: bk_parens When this bit is set then "\(" and "\)" are used to group sub-expressions and "(" and ")" are ordinary characters, this is the opposite of default behavior.

regbase:: bk_refs When this bit is set then back references are allowed.

regbase:: bk_vbar When this bit is set then "\|" represents the alternation operator and "|" is an ordinary character. This is the opposite of default behavior.

regbase:: use_except When this bit is set then a bad_expression exception will be thrown on error. Use of this flag is deprecated - basic_regex will always throw on error.

regbase:: failbit This bit is set on error, if regbase::use_except is not set, then this bit should be checked to see if a regular expression is valid before usage.

regbase::literal All characters in the string are treated as literals, there are no special characters or escape sequences.

regbase::emacs Provides compatability with the emacs editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.

Revised 24 Oct 2003

Use, modification and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Element	Effect if set
normal	Specifies that the grammar recognized by the regular expression engine uses its normal semantics: that is the same as that given in the ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1). boost.regex also recognizes most perl-compatible extensions in this mode.
icase	Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.
nosubs	Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure.
optimize	Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex.
collate	Specifies that character ranges of the form "[a-b]" should be locale sensitive.
ECMAScript	The same as normal.
JavaScript	The same as normal.
JScript	The same as normal.
basic	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).
extended	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX extended regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).
awk	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk (FWD.1). That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted.
grep	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). That is to say, the same as POSIX basic syntax, but with the newline character acting as an alternation character in addition to "\|".
egrep	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep when given the -E option in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). That is to say, the same as POSIX extended syntax, but with the newline character acting as an alternation character in addition to "\|".
sed	The same as basic.
perl	The same as normal.

regbase::escape_in_lists	Allows the use of the escape "\" character in sets of characters, for example [\]] represents the set of characters containing only "]". If this flag is not set then "\" is an ordinary character inside sets.
regbase::char_classes	When this bit is set, character classes [:classname:] are allowed inside character set declarations, for example "[[:word:]]" represents the set of all characters that belong to the character class "word".
regbase:: intervals	When this bit is set, repetition intervals are allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter a's.
regbase:: limited_ops	When this bit is set all of "+", "?" and "\|" are ordinary characters in all situations.
regbase:: newline_alt	When this bit is set, then the newline character "\n" has the same effect as the alternation operator "\|".
regbase:: bk_plus_qm	When this bit is set then "\+" represents the one or more repetition operator and "\?" represents the zero or one repetition operator. When this bit is not set then "+" and "?" are used instead.
regbase:: bk_braces	When this bit is set then "\{" and "\}" are used for bounded repetitions and "{" and "}" are normal characters. This is the opposite of default behavior.
regbase:: bk_parens	When this bit is set then "\(" and "\)" are used to group sub-expressions and "(" and ")" are ordinary characters, this is the opposite of default behavior.
regbase:: bk_refs	When this bit is set then back references are allowed.
regbase:: bk_vbar	When this bit is set then "\\|" represents the alternation operator and "\|" is an ordinary character. This is the opposite of default behavior.
regbase:: use_except	When this bit is set then a bad_expression exception will be thrown on error. Use of this flag is deprecated - basic_regex will always throw on error.
regbase:: failbit	This bit is set on error, if regbase::use_except is not set, then this bit should be checked to see if a regular expression is valid before usage.
regbase::literal	All characters in the string are treated as literals, there are no special characters or escape sequences.
regbase::emacs	Provides compatability with the emacs editor, eqivalent to: bk_braces \| bk_parens \| bk_refs \| bk_vbar.