Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Click here to view the latest version of this page.
PrevUpHomeNext

Standards Conformance

C++

Boost.Regex is intended to conform to the Technical Report on C++ Library Extensions.

ECMAScript / JavaScript

All of the ECMAScript regular expression syntax features are supported, except that:

The escape sequence \u matches any upper case character (the same as [[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for Unicode escape sequences.

Perl

Almost all Perl features are supported, except for:

(?{code}) Not implementable in a compiled strongly typed language.

(??{code}) Not implementable in a compiled strongly typed language.

POSIX

All the POSIX basic and extended regular expression features are supported, except that:

No character collating names are recognized except those specified in the POSIX standard for the C locale, unless they are explicitly registered with the traits class.

Character equivalence classes ( [[=a=]] etc) are probably buggy except on Win32. Implementing this feature requires knowledge of the format of the string sort keys produced by the system; if you need this, and the default implementation doesn't work on your platform, then you will need to supply a custom traits class.

Unicode

The following comments refer to Unicode Technical Standard #18: Unicode Regular Expressions version 11.

Item

Feature

Support

1.1

Hex Notation

Yes: use \x{DDDD} to refer to code point UDDDD.

1.2

Character Properties

All the names listed under the General Category Property are supported. Script names and Other Names are not currently supported.

1.3

Subtraction and Intersection

Indirectly support by forward-lookahead:

(?=[[:X:]])[[:Y:]]

Gives the intersection of character properties X and Y.

(?![[:X:]])[[:Y:]]

Gives everything in Y that is not in X (subtraction).

1.4

Simple Word Boundaries

Conforming: non-spacing marks are included in the set of word characters.

1.5

Caseless Matching

Supported, note that at this level, case transformations are 1:1, many to many case folding operations are not supported (for example "" to "SS").

1.6

Line Boundaries

Supported, except that "." matches only one character of "\r\n". Other than that word boundaries match correctly; including not matching in the middle of a "\r\n" sequence.

1.7

Code Points

Supported: provided you use the u32* algorithms, then UTF-8, UTF-16 and UTF-32 are all treated as sequences of 32-bit code points.

2.1

Canonical Equivalence

Not supported: it is up to the user of the library to convert all text into the same canonical form as the regular expression.

2.2

Default Grapheme Clusters

Not supported.

2.3Default Word Boundaries

Not supported.

2.4

Default Loose Matches

Not Supported.

2.5

Named Properties

Supported: the expression "[[:name:]]" or \N{name} matches the named character "name".

2.6

Wildcard properties

Not Supported.

3.1

Tailored Punctuation.

Not Supported.

3.2

Tailored Grapheme Clusters

Not Supported.

3.3

Tailored Word Boundaries.

Not Supported.

3.4

Tailored Loose Matches

Partial support: [[=c=]] matches characters with the same primary equivalence class as "c".

3.5

Tailored Ranges

Supported: [a-b] matches any character that collates in the range a to b, when the expression is constructed with the collate flag set.

3.6

Context Matches

Not Supported.

3.7

Incremental Matches

Supported: pass the flag match_partial to the regex algorithms.

3.8

Unicode Set Sharing

Not Supported.

3.9

Possible Match Sets

Not supported, however this information is used internally to optimise the matching of regular expressions, and return quickly if no match is possible.

3.10

Folded Matching

Partial Support: It is possible to achieve a similar effect by using a custom regular expression traits class.

3.11

Custom Submatch Evaluation

Not Supported.


PrevUpHomeNext