Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. — Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for an old version of Boost. Click here to view this page for the latest version.

Unicode and Boost.Regex

There are two ways to use Boost.Regex with Unicode strings:

Rely on wchar_t

If your platform's wchar_t type can hold Unicode strings, and your platform's C/C++ runtime correctly handles wide character constants (when passed to std::iswspace std::iswlower etc), then you can use boost::wregex to process Unicode. However, there are several disadvantages to this approach:

It's not portable: there's no guarantee on the width of wchar_t, or even whether the runtime treats wide characters as Unicode at all, most Windows compilers do so, but many Unix systems do not.
There's no support for Unicode-specific character classes: [[:Nd:]], [[:Po:]] etc.
You can only search strings that are encoded as sequences of wide characters, it is not possible to search UTF-8, or even UTF-16 on many platforms.

Use a Unicode Aware Regular Expression Type.

If you have the ICU library, then Boost.Regex can be configured to make use of it, and provide a distinct regular expression type (boost::u32regex), that supports both Unicode specific character properties, and the searching of text that is encoded in either UTF-8, UTF-16, or UTF-32. See: ICU string class support.