Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for an old version of Boost. Click here to view this page for the latest version.

libs/regex/doc/collating_names.qbk

[/ 
  Copyright 2006-2007 John Maddock.
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]


[section:collating_names Collating Names]

[section:digraphs Digraphs]

The following are treated as valid digraphs when used as a collating name:

"ae", "Ae", "AE", "ch", "Ch", "CH", "ll", "Ll", "LL", "ss", "Ss", "SS", "nj", "Nj", "NJ", "dz", "Dz", "DZ", "lj", "Lj", "LJ".

So for example the expression:

[pre \[\[.ae.\]-c\] ]

will match any character that collates between the digraph "ae" and the character "c".

[endsect]

[section:posix_symbolic_names POSIX Symbolic Names]

The following symbolic names are recognised as valid collating element names, 
in addition to any single character, this allows you to write for example:

[pre \[\[.left-square-bracket.\]\[.right-square-bracket.\]\]]

if you wanted to match either "\[" or "\]".

[table
[[Name][Character]]
[[NUL] 	[\\x00]]
[[SOH] 	[\\x01]]
[[STX] 	[\\x02]]
[[ETX] 	[\\x03]]
[[EOT] 	[\\x04]]
[[ENQ] 	[\\x05]]
[[ACK] 	[\\x06]]
[[alert] 	[\\x07]]
[[backspace] 	[\\x08]]
[[tab] 	[\\t]]
[[newline] 	[\\n]]
[[vertical-tab] 	[\\v]]
[[form-feed] 	[\\f]]
[[carriage-return] 	[\\r]]
[[SO] 	[\\xE]]
[[SI] 	[\\xF]]
[[DLE] 	[\\x10]]
[[DC1] 	[\\x11]]
[[DC2] 	[\\x12]]
[[DC3] 	[\\x13]]
[[DC4] 	[\\x14]]
[[NAK] 	[\\x15]]
[[SYN] 	[\\x16]]
[[ETB] 	[\\x17]]
[[CAN] 	[\\x18]]
[[EM] 	[\\x19]]
[[SUB] 	[\\x1A]]
[[ESC] 	[\\x1B]]
[[IS4] 	[\\x1C]]
[[IS3] 	[\\x1D]]
[[IS2] 	[\\x1E]]
[[IS1] 	[\\x1F]]
[[space] 	[\\x20]]
[[exclamation-mark] 	[!]]
[[quotation-mark] 	["]]
[[number-sign] 	[#]]
[[dollar-sign] 	[$]]
[[percent-sign] 	[%]]
[[ampersand] 	[&]]
[[apostrophe] 	[\']]
[[left-parenthesis] 	[(]]
[[right-parenthesis] 	[)]]
[[asterisk] 	[\*]]
[[plus-sign] 	[+]]
[[comma] 	[,]]
[[hyphen] 	[-]]
[[period] 	[.]]
[[slash] 	[ / ]]
[[zero] 	[0]]
[[one] 	[1]]
[[two] 	[2]]
[[three] 	[3]]
[[four] 	[4]]
[[five] 	[5]]
[[six] 	[6]]
[[seven] 	[7]]
[[eight] 	[8]]
[[nine] 	[9]]
[[colon] 	[\:]]
[[semicolon] 	[;]]
[[less-than-sign] 	[<]]
[[equals-sign] 	[=]]
[[greater-than-sign] 	[>]]
[[question-mark] 	[?]]
[[commercial-at] 	[@]]
[[left-square-bracket] 	[\[]]
[[backslash][\\]]
[[right-square-bracket][\]]]
[[circumflex][~]]
[[underscore][_]]
[[grave-accent][`]]
[[left-curly-bracket][{]]
[[vertical-line][|]]
[[right-curly-bracket][}]]
[[tilde][~]]
[[DEL][\\x7F]]
]

[endsect]

[section:named_unicode Named Unicode Characters]

When using [link boost_regex.unicode Unicode aware regular expressions] (with the `u32regex` type), all 
the normal symbolic names for Unicode characters (those given in Unidata.txt) 
are recognised.  So for example:

[pre \[\[.CYRILLIC CAPITAL LETTER I.\]\] ]

would match the Unicode character 0x0418.

[endsect]
[endsect]