Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Design Rationale

Why is it needed?

Why do we need a localization library, when standard C++ facets (should) provide most of the required functionality:

So why do we need such library if we have all the functionality within the standard library?

Almost every(!) facet has design flaws:

Also, many features are not really supported by std::locale at all: timezones (as mentioned above), text boundary analysis, number spelling, and many others. So it is clear that the standard C++ locales are problematic for real-world applications.

Why use an ICU wrapper instead of ICU?

ICU is a very good localization library, but it has several serious flaws:

For example: Boost.Locale provides direct integration with iostream allowing a more natural way of data formatting. For example:

    cout << "You have "<<as::currency << 134.45 << " in your account as of "<<as::datetime << std::time(0) << endl;

Why an ICU wrapper and not an implementation-from-scratch?

ICU is one of the best localization/Unicode libraries available. It consists of about half a million lines of well-tested, production-proven source code that today provides state-of-the art localization tools.

Reimplementing of even a small part of ICU's abilities is an infeasible project which would require many man-years. So the question is not whether we need to reimplement the Unicode and localization algorithms from scratch, but "Do we need a good localization library in Boost?"

Thus Boost.Locale wraps ICU with a modern C++ interface, allowing future reimplementation of parts with better alternatives, but bringing localization support to Boost today and not in the not-so-near-if-at-all future.

Why is the ICU API not exposed to the user?

Yes, the entire ICU API is hidden behind opaque pointers and users have no access to it. This is done for several reasons:

Why use GNU Gettext catalogs for message formatting?

There are many available localization formats. The most popular so far are OASIS XLIFF, GNU gettext po/mo files, POSIX catalogs, Qt ts/tm files, Java properties, and Windows resources. However, the last three are useful only in their specific areas, and POSIX catalogs are too simple and limited, so there are only two reasonable options:

  1. Standard localization format OASIS XLIFF.
  2. GNU Gettext binary catalogs.

The first one generally seems like a more correct localization solution, but it requires XML parsing for loading documents, it is very complicated format, and even ICU requires preliminary compilation of it into ICU resource bundles.

On the other hand:

So, even though the GNU Gettext mo catalog format is not an officially approved file format:

Note:
Boost.Locale does not use any of the GNU Gettext code, it just reimplements the tool for reading and using mo-files, eliminating the biggest GNU Gettext flaw at present -- thread safety when using multiple locales.

Why is a plain number used for the representation of a date-time, instead of a Boost.DateTime date or Boost.DateTime ptime?

There are several reasons:

  1. A Gregorian Date by definition can't be used to represent locale-independent dates, because not all calendars are Gregorian.
  2. ptime -- definitely could be used, but it has several problems:
    • It is created in GMT or Local time clock, when `time()` gives a representation that is independent of time zones (usually GMT time), and only later should it be represented in a time zone that the user requests.
      The timezone is not a property of time itself, but it is rather a property of time formatting.
    • ptime already defines operator<< and operator>> for time formatting and parsing.
    • The existing facets for ptime formatting and parsing were not designed in a way that the user can override. The major formatting and parsing functions are not virtual. This makes it impossible to reimplement the formatting and parsing functions of ptime unless the developers of the Boost.DateTime library decide to change them.
      Also, the facets of ptime are not "correctly" designed in terms of division of formatting information and locale information. Formatting information should be stored within std::ios_base and information about locale-specific formatting should be stored in the facet itself.
      The user of the library should not have to create new facets to change simple formatting information like "display only the date" or "display both date and time."

Thus, at this point, ptime is not supported for formatting localized dates and times.

Why are POSIX locale names used and not something like the BCP-47 IETF language tag?

There are several reasons:

Why most parts of Boost.Locale work only on linear/contiguous chunks of text

There are two reasons:

However:

Why all Boost.Locale implementation is hidden behind abstract interfaces and does not use template metaprogramming?

There are several major reasons:

Why Boost.Locale does not provide char16_t/char32_t for non-C++0x platforms.

There are several reasons:

These are exactly the reasons why Boost.Locale fails with current limited C++0x characters support on GCC-4.5 (the second reason) and MSVC-2010 (the first reason)

So basically it is impossible to use non-C++ characters with the C++'s locales framework.

The best and the most portable solution is to use the C++'s char type and UTF-8 encodings.