Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext
Unicode regular expression types

Header <boost/regex/icu.hpp> provides a regular expression traits class that handles UTF-32 characters:

class icu_regex_traits;

and a regular expression type based upon that:

typedef basic_regex<UChar32,icu_regex_traits> u32regex;

The type u32regex is regular expression type to use for all Unicode regular expressions; internally it uses UTF-32 code points, but can be created from, and used to search, either UTF-8, or UTF-16 encoded strings as well as UTF-32 ones.

The constructors, and assign member functions of u32regex, require UTF-32 encoded strings, but there are a series of overloaded algorithms called make_u32regex which allow regular expressions to be created from UTF-8, UTF-16, or UTF-32 encoded strings:

template <class InputIterator>
u32regex make_u32regex(InputIterator i,
                       InputIterator j,
                       boost::regex_constants::syntax_option_type opt);

Effects: Creates a regular expression object from the iterator sequence [i,j). The character encoding of the sequence is determined based upon sizeof(*i): 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.

u32regex make_u32regex(const char* p,
                       boost::regex_constants::syntax_option_type opt
                           = boost::regex_constants::perl);

Effects: Creates a regular expression object from the Null-terminated UTF-8 characater sequence p.

u32regex make_u32regex(const unsigned char* p,
                       boost::regex_constants::syntax_option_type opt
                           = boost::regex_constants::perl);

Effects: Creates a regular expression object from the Null-terminated UTF-8 characater sequence p.

u32regex make_u32regex(const wchar_t* p,
                       boost::regex_constants::syntax_option_type opt
                           = boost::regex_constants::perl);

Effects: Creates a regular expression object from the Null-terminated characater sequence p. The character encoding of the sequence is determined based upon sizeof(wchar_t): 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.

u32regex make_u32regex(const UChar* p,
                       boost::regex_constants::syntax_option_type opt
                           = boost::regex_constants::perl);

Effects: Creates a regular expression object from the Null-terminated UTF-16 characater sequence p.

template<class C, class T, class A>
u32regex make_u32regex(const std::basic_string<C, T, A>& s,
                       boost::regex_constants::syntax_option_type opt
                           = boost::regex_constants::perl);

Effects: Creates a regular expression object from the string s. The character encoding of the string is determined based upon sizeof(C): 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.

u32regex make_u32regex(const UnicodeString& s,
                       boost::regex_constants::syntax_option_type opt
                           = boost::regex_constants::perl);

Effects: Creates a regular expression object from the UTF-16 encoding string s.


PrevUpHomeNext