Boost.Locale
Introduction to C++ Standard Library localization support

Getting familiar with standard C++ Locales

The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the std::locale class, the container that holds all the required information about a specific culture, such as number formatting patterns, date and time formatting, currency, case conversion etc.

All this information is provided by facets, special classes derived from the std::locale::facet base class. Such facets are packed into the std::locale class and allow you to provide arbitrary information about the locale. The std::locale class keeps reference counters on installed facets and can be efficiently copied.

Each facet that was installed into the std::locale object can be fetched using the std::use_facet function. For example, the std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:

std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale);
char upper_a = ctype_facet.toupper('a');

A locale object can be imbued into an iostream so it would format information according to the locale:

cout.imbue(std::locale("en_US.UTF-8"));
cout << 1345.45 << endl;
cout.imbue(std::locale("ru_RU.UTF-8"));
cout << 1345.45 << endl;

Would display:

    1,345.45 1.345,45

You can also create your own facets and install them into existing locale objects. For example:

    class measure : public std::locale::facet {
    public:
        typedef enum { inches, ... } measure_type;
        measure(measure_type m,size_t refs=0) 
        double from_metric(double value) const;
        std::string name() const;
        ...
    };

And now you can simply provide this information to a locale:

    std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches)));
    /// Create default locale built from en_US locale and add paper size facet.

Now you can print a distance according to the correct locale:

    void print_distance(std::ostream &out,double value)
    {
        measure const &m = std::use_facet<measure>(out.getloc());
        // Fetch locale information from stream
        out << m.from_metric(value) << " " << m.name();
    }

This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.

Common Critical Problems with the Standard Library

There are numerous issues in the standard library that prevent the use of its full power, and there are several additional issues:

  • Setting the global locale has bad side effects.
    Consider following code:
            int main()
            {
                std::locale::global(std::locale("")); 
                // Set system's default locale as global
                std::ofstream csv("test.csv");
                csv << 1.1 << ","  << 1.3 << std::endl;
            }
    

    What would be the content of test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3" rather than what you had expected.
    More than that it affects even printf and libraries like boost::lexical_cast giving incorrect or unexpected formatting. In fact many third-party libraries are broken in such a situation.
    Unlike the standard localization library, Boost.Locale never changes the basic number formatting, even when it uses std based localization backends, so by default, numbers are always formatted using C-style locale. Localized number formatting requires specific flags.
  • Number formatting is broken on some locales.
    Some locales use the non-breakable space u00A0 character for thousands separator, thus in ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle this correctly, for example GCC and SunStudio display a "\xC2" character instead of the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and actually generate invalid UTF-8.
  • Locale names are not standardized. For example, under MSVC you need to provide the name en-US or English_USA.1252 , when on POSIX platforms it would be en_US.UTF-8 or en_US.ISO-8859-1
    More than that, MSVC does not support UTF-8 locales at all.
  • Many standard libraries provide only the C and POSIX locales, thus GCC supports localization only under Linux. On all other platforms, attempting to create locales other than "C" or "POSIX" would fail.