Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for an old version of boost. Click here for the latest Boost documentation.
Endian Home     Conversion Functions     Arithmetic Types     Buffer Types     Choosing Approach

Contents
Abstract
Introduction to endianness
Introduction to the Boost.Endian library
Choosing between conversion functions,
  buffer types, and arithmetic types
Built-in support for Intrinsics
Performance
   Timings
Overall FAQ
Release history
   Changes requested by formal review
   Other changes since formal review
Compatibility with interim releases
C++03 support for C++11 features
Future directions
Acknowledgements

Abstract

Boost.Endian provides facilities to manipulate the endianness of integers and user-defined types.

Introduction to endianness

Consider the following code:

int16_t i = 0x0102;
FILE * file = fopen("test.bin", "wb");   // binary file!
fwrite(&i, sizeof(int16_t), 1, file);
fclose(file);

On OS X, Linux, or Windows systems with an Intel CPU, a hex dump of the "test.bin" output file produces:

0201

On OS X systems with a PowerPC CPU, or Solaris systems with a SPARC CPU, a hex dump of the "test.bin" output file produces:

0102

What's happening here is that Intel CPUs order the bytes of an integer with the least-significant byte first, while SPARC CPUs place the most-significant byte first. Some CPUs, such as the PowerPC, allow the operating system to choose which ordering applies.

Most-significant-byte-first ordering is traditionally called "big endian" ordering and least-significant-byte-first is traditionally called "little-endian" ordering. The names are derived from Jonathan Swift's satirical novel Gulliver’s Travels, where rival kingdoms opened their soft-boiled eggs at different ends.

See Wikipedia's Endianness article for an extensive discussion of endianness.

Programmers can usually ignore endianness, except when reading a core dump on little-endian systems. But programmers have to deal with endianness when exchanging binary integers and binary floating point values between computer systems with differing endianness, whether by physical file transfer or over a network. And programmers may also want to use the library when minimizing either internal or external data sizes is advantageous.

Introduction to the Boost.Endian library

Boost.Endian provides three different approaches to dealing with endianness. All three approaches support integers and user-define types (UDTs).

Each approach has a long history of successful use, and each approach has use cases where it is preferred to the other approaches.

Endian conversion functions - The application uses the built-in integer types to hold values, and calls the provided conversion functions to convert byte ordering as needed. Both mutating and non-mutating conversions are supplied, and each comes in unconditional and conditional variants.

Endian buffer types - The application uses the provided endian buffer types to hold values, and explicitly converts to and from the built-in integer types. Buffer sizes of 8, 16, 24, 32, 40, 48, 56, and 64 bits (i.e. 1, 2, 3, 4, 5, 6, 7, and 8 bytes) are provided. Unaligned integer buffer types are provided for all sizes, and aligned buffer types are provided for 16, 32, and 64-bit sizes. The provided specific types are typedefs for a generic class template that may be used directly for less common use cases.

Endian arithmetic types - The application uses the provided endian arithmetic types, which supply the same operations as the built-in C++ arithmetic types. All conversions are implicit. Arithmetic sizes of 8, 16, 24, 32, 40, 48, 56, and 64 bits (i.e. 1, 2, 3, 4, 5, 6, 7, and 8 bytes) are provided. Unaligned integer types are provided for all sizes and aligned arithmetic types are provided for 16, 32, and 64-bit sizes. The provided specific types are typedefs for a generic class template that may be used directly in generic code of for less common use cases.

Boost Endian is a header-only library. C++11 features affecting interfaces, such as noexcept, are used only if available. See C++03 support for C++11 features for details.

Choosing between conversion functions, buffer types, and arithmetic types

This section has been moved to its own Choosing the Approach page.

Built-in support for Intrinsics

Most compilers, including GCC, Clang, and Visual C++, supply built-in support for byte swapping intrinsics. The Endian library uses these intrinsics when available since they may result in smaller and faster generated code, particularly for optimized builds.

Defining the macro BOOST_ENDIAN_NO_INTRINSICS will suppress use of the intrinsics. This is useful when a compiler has no intrinsic support or fails to locate the appropriate header, perhaps because it is an older release or has very limited supporting libraries.

The macro BOOST_ENDIAN_INTRINSIC_MSG is defined as either "no byte swap intrinsics" or a string describing the particular set of intrinsics being used. This is useful for eliminating missing intrinsics as a source of performance issues.

Performance

Consider this problem:

Example 1

Add 100 to a big endian value in a file, then write the result to a file
Endian arithmetic type approach Endian conversion function approach
big_int32_at x;

... read into x from a file ...

x += 100;

... write x to a file ...
  
int32_t x;

... read into x from a file ...

big_to_native_inplace(x);
x += 100;
native_to_big_inplace(x);

... write x to a file ...

There will be no performance difference between the two approaches in optimized builds, regardless of the native endianness of the machine. That's because optimizing compilers will generate exactly the same code for each. That conclusion was confirmed by studying the generated assembly code for GCC and Visual C++. Furthermore, time spent doing I/O will determine the speed of this application.

Now consider a slightly different problem: 

Example 2

Add a million values to a big endian value in a file, then write the result to a file
Endian arithmetic type approach Endian conversion function approach
big_int32_at x;

... read into x from a file ...

for (int32_t i = 0; i < 1000000; ++i)
  x += i;

... write x to a file ...
int32_t x;

... read into x from a file ...

big_to_native_inplace(x);

for (int32_t i = 0; i < 1000000; ++i)
  x += i;

native_to_big_inplace(x);

... write x to a file ...

With the Endian arithmetic approach, on little endian platforms an implicit conversion from and then back to big endian is done inside the loop. With the Endian conversion function approach, the user has ensured the conversions are done outside the loop, so the code may run more quickly on little endian platforms.

Timings

These tests were run against release builds on a circa 2012 4-core little endian X64 Intel Core i5-3570K CPU @ 3.40GHz under Windows 7.

Caveat emptor: The Windows CPU timer has very high granularity. Repeated runs of the same tests often yield considerably different results.

See test/loop_time_test.cpp for the actual code and benchmark/Jamfile.v2 for the build setup.

GNU C++ version 4.8.2 on Linux virtual machine
Iterations: 10'000'000'000, Intrinsics: __builtin_bswap16, etc.
Test Case Endian
arithmetic
type
Endian
conversion
function
16-bit aligned big endian8.46 s5.28 s
16-bit aligned little endian5.28 s5.22 s
32-bit aligned big endian8.40 s2.11 s
32-bit aligned little endian2.11 s2.10 s
64-bit aligned big endian14.02 s3.10 s
64-bit aligned little endian3.00 s3.03 s

Microsoft Visual C++ version 14.0
Iterations: 10'000'000'000, Intrinsics: cstdlib _byteswap_ushort, etc.
Test Case Endian
arithmetic
type
Endian
conversion
function
16-bit aligned big endian8.27 s5.26 s
16-bit aligned little endian5.29 s5.32 s
32-bit aligned big endian8.36 s5.24 s
32-bit aligned little endian5.24 s5.24 s
64-bit aligned big endian13.65 s3.34 s
64-bit aligned little endian3.35 s2.73 s

Overall FAQ

Is the implementation header only?

Yes.

Are C++03 compilers supported?

Yes.

Does the implementation use compiler intrinsic built-in byte swapping?

Yes, if available. See Intrinsic built-in support.

Why bother with endianness?

Binary data portability is the primary use case.

Does endianness have any uses outside of portable binary file or network I/O formats?

Using the unaligned integer types with a size tailored to the application's needs is a minor secondary use that saves internal or external memory space. For example, using big_int40_buf_t or big_int40_t in a large array saves a lot of space compared to one of the 64-bit types.

Why bother with binary I/O? Why not just use C++ Standard Library stream inserters and extractors?

Data interchange formats often specify binary integer data.

Binary integer data is smaller and therefore I/O is faster and file sizes are smaller. Transfer between systems is less expensive.

Furthermore, binary integer data is of fixed size, and so fixed-size disk records are possible without padding, easing sorting and allowing random access.

Disadvantages, such as the inability to use text utilities on the resulting files, limit usefulness to applications where the binary I/O advantages are paramount.

Which is better, big-endian or little-endian?

Big-endian tends to be preferred in a networking environment and is a bit more of an industry standard, but little-endian may be preferred for applications that run primarily on x86, x86-64, and other little-endian CPU's. The Wikipedia article gives more pros and cons.

Why are only big and little native endianness supported?

These are the only endian schemes that have any practical value today. PDP-11 and the other middle endian approaches are interesting curiosities but have no relevance for today's C++ developers. The same is true for architectures that allow runtime endianness switching. The specification for native ordering has been carefully crafted to allow support for such orderings in the future, should the need arise. Thanks to Howard Hinnant for suggesting this.

Why do both the buffer and arithmetic types exist?

Conversions in the buffer types are explicit. Conversions in the arithmetic types are implicit. This fundamental difference is a deliberate design feature that would be lost if the inheritance hierarchy were collapsed.

The original design provided only arithmetic types. Buffer types were requested during formal review by those wishing total control over when conversion occurs. They also felt that buffer types would be less likely to be misused by maintenance programmers not familiar with the implications of performing a lot of integer operations on the endian arithmetic integer types.

What is gained by using the buffer types rather than always just using the arithmetic types?

Assurance that hidden conversions are not performed. This is of overriding importance to users concerned about achieving the ultimate in terms of speed.

"Always just using the arithmetic types" is fine for other users. When the ultimate in speed needs to be ensured, the arithmetic types can be used in the same design patterns or idioms that would be used for buffer types, resulting in the same code being generated for either types.

What are the limitations of integer support?

Tests have only been performed on machines that use two's complement arithmetic. The Endian conversion functions only support 16, 32, and 64-bit aligned integers. The endian types only support 8, 16, 24, 32, 40, 48, 56, and 64-bit unaligned integers, and 8, 16, 32, and 64-bit aligned integers.

Why is there no floating point support?

An attempt was made to support four-byte floats and eight-byte doubles, limited to IEEE 754 (also know as ISO/IEC/IEEE 60559) floating point and further limited to systems where floating point endianness does not differ from integer endianness.

Even with those limitations, support for floating point types was not reliable and was removed. For example, simply reversing the endianness of a floating point number can result in a signaling-NAN. For all practical purposes, binary serialization and endianness for integers are one and the same problem. That is not true for floating point numbers, so binary serialization interfaces and formats for floating point does not fit well in an endian-based library.

Release history

Changes requested by formal review

The library was reworked from top to bottom to accommodate changes requested during the formal review. See Mini-Review page for details.

Other changes since formal review

Compatibility with interim releases

Prior to the official Boost release, class template endian_arithmetic has been used for a decade or more with the same functionality but under the name endian. Other names also changed in the official release. If the macro BOOST_ENDIAN_DEPRECATED_NAMES is defined, those old now deprecated names are still supported. However, the class template endian name is only provided for compilers supporting C++11 template aliases. For C++03 compilers, the name will have to be changed to endian_arithmetic.

To support backward header compatibility, deprecated header boost/endian/endian.hpp forwards to boost/endian/arithmetic.hpp. It requires BOOST_ENDIAN_DEPRECATED_NAMES be defined. It should only be used while transitioning to the official Boost release of the library as it will be removed in some future release.

C++03 support for C++11 features

C++11 Feature Action with C++03 Compilers
Scoped enums Uses header boost/core/scoped_enum.hpp to emulate C++11 scoped enums.
noexcept Uses BOOST_NOEXCEPT macro, which is defined as null for compilers not supporting this C++11 feature.
C++11 PODs (N2342) Takes advantage of C++03 compilers that relax C++03 POD rules, but see Limitations here and here. Also see macros for explicit POD control here and here.

Future directions

Standardization. The plan is to submit Boost.Endian to the C++ standards committee for possible inclusion in a Technical Specification or the C++ standard itself.

Specializations for numeric_limits. Roger Leigh requested that all boost::endian types provide numeric_limits specializations. See GitHub issue 4.

Character buffer support. Peter Dimov pointed out during the mini-review that getting and setting basic arithmetic types (or <cstdint> equivalents) from/to an offset into an array of unsigned char is a common need. See Boost.Endian mini-review posting.

Out-of-range detection. Peter Dimov pointed suggested during the mini-review that throwing an exception on buffer values being out-of-range might be desirable. See the end of this posting and subsequent replies.

Acknowledgements

Comments and suggestions were received from Adder, Benaka Moorthi, Christopher Kohlhoff, Cliff Green, Daniel James, Dave Handley, Gennaro Proto, Giovanni Piero Deretta, Gordon Woodhull, dizzy, Hartmut Kaiser, Howard Hinnant, Jason Newton, Jeff Flinn, Jeremy Maitin-Shepard, John Filo, John Maddock, Kim Barrett, Marsh Ray, Martin Bonner, Mathias Gaunard, Matias Capeletto, Neil Mayhew, Nevin Liber, Olaf van der Spek, Paul Bristow, Peter Dimov, Pierre Talbot, Phil Endecott, Philip Bennefall, Pyry Jahkola, Rene Rivera, Robert Stewart, Roger Leigh, Roland Schwarz, Scott McMurray, Sebastian Redl, Tim Blechmann, Tim Moore, tymofey, Tomas Puverle, Vincente Botet, Yuval Ronen and Vitaly Budovsk. Apologies if anyone has been missed.


Last revised: 05 April, 2016

© Copyright Beman Dawes, 2011, 2013

Distributed under the Boost Software License, Version 1.0. See www.boost.org/ LICENSE_1_0.txt