The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the
std::locale class, the container that holds all the required information about a specific culture, such as number formatting patterns, date and time formatting, currency, case conversion etc.
All this information is provided by facets, special classes derived from the
std::locale::facet base class. Such facets are packed into the
std::locale class and allow you to provide arbitrary information about the locale. The
std::locale class keeps reference counters on installed facets and can be efficiently copied.
Each facet that was installed into the
std::locale object can be fetched using the
std::use_facet function. For example, the
std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:
A locale object can be imbued into an
iostream so it would format information according to the locale:
You can also create your own facets and install them into existing locale objects. For example:
And now you can simply provide this information to a locale:
Now you can print a distance according to the correct locale:
This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.
There are numerous issues in the standard library that prevent the use of its full power, and there are several additional issues:
test.csv? It may be "1.1,1.3" or it may be "1,1,1,3" rather than what you had expected.
printfand libraries like
boost::lexical_castgiving incorrect or unexpected formatting. In fact many third-party libraries are broken in such a situation.
stdbased localization backends, so by default, numbers are always formatted using C-style locale. Localized number formatting requires specific flags.
ru_RU.UTF-8locale number 1024 should be displayed as "1 024" where the space is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle this correctly, for example GCC and SunStudio display a "\xC2" character instead of the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and actually generate invalid UTF-8.
English_USA.1252, when on POSIX platforms it would be