...one of the most highly
regarded and expertly designed C++ library projects in the
world.

— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards

The

algorithm used in this library is designed to provide best possible
worst-case performance, while still being cache-friendly. It provides
the better of `spreadsort`

*𝑶(N*log(K/S + S))* and *𝑶(N*log(N))*
worst-case time, where *K* is the log of the range.
The log of the range is normally the length in bits of the data type;
32 for a 32-bit integer.

`flash_sort`

(another hybrid algorithm), by comparison is
*𝑶(N)* for evenly distributed lists. The problem
is, `flash_sort`

is merely an MSD radix
sort combined with *𝑶(N*N)* insertion sort
to deal with small subsets where the MSD Radix Sort is inefficient,
so it is inefficient with chunks of data around the size at which it
switches to `insertion_sort`

, and ends up operating as an
enhanced MSD Radix Sort. For uneven distributions this makes it especially
inefficient.

and `integer_sort`

use introsort
instead, which provides `float_sort`

*𝑶(N*log(N))* performance
for these medium-sized pieces. Also, `flash_sort`

's *𝑶(N)*
performance for even distributions comes at the cost of cache misses,
which on modern architectures are extremely expensive, and in testing
on modern systems ends up being slower than cutting up the data in
multiple, cache-friendly steps. Also worth noting is that on most modern
computers, `log2(available RAM)/log2(L1 cache size)`

is
around 3, where a cache miss takes more than 3 times as long as an
in-cache random-access, and the size of *max_splits*
is tuned to the size of the cache. On a computer where cache misses
aren't this expensive, *max_splits* could be increased
to a large value, or eliminated entirely, and `integer_sort/float_sort`

would have the same *𝑶(N)* performance on even distributions.

Adaptive Left Radix (ALR) is similar to `flash_sort`

, but
more cache-friendly. It still uses insertion_sort. Because ALR uses
*𝑶(N*N)* `insertion_sort`

, it isn't efficient
to use the comparison-based fallback sort on large lists, and if the
data is clustered in small chunks just over the fallback size with
a few outliers, radix-based sorting iterates many times doing little
sorting with high overhead. Asymptotically, ALR is still *𝑶(N*log(K/S
+ S))*, but with a very small *S* (about
2 in the worst case), which compares unfavorably with the 11 default
value of *max_splits* for Spreadsort.

ALR also does not have the *𝑶(N*log(N))* fallback,
so for small lists that are not evenly distributed it is extremely
inefficient. See the `alrbreaker`

and `binaryalrbreaker`

testcases for examples; either replace the call to sort with a call
to ALR and update the ALR_THRESHOLD at the top, or as a quick comparison
make `get_max_count return ALR_THRESHOLD`

(20 by default
based upon the paper). These small tests take 4-10 times as long with
ALR as std::sort
in the author's testing, depending on the test system, because they
are trying to sort a highly uneven distribution. Normal Spreadsort
does much better with them, because `get_max_count`

is designed
around minimizing worst-case runtime.

`burst_sort`

is an efficient hybrid algorithm for strings
that uses substantial additional memory.

uses minimal additional memory by comparison. Speed comparisons between
the two haven't been made, but the better memory efficiency makes
`string_sort`

more general.
`string_sort`

`postal_sort`

and

are similar. A direct performance comparison would be welcome, but
an efficient version of `string_sort`

`postal_sort`

was not found in a
search for source.

is most similar to the American
flag sort algorithm. The main difference is that it doesn't
bother trying to optimize how empty buckets/piles are handled, instead
just checking to see if all characters at the current index are equal.
Other differences are using std::sort
as the fallback algorithm, and a larger fallback size (256 vs. 16),
which makes empty pile handling less important.
`string_sort`

Another difference is not applying the stack-size restriction. Because
of the equality check in

,
it would take `string_sort`

*m*m* memory worth of strings to force

to create a stack of depth `string_sort`

*m*. This problem isn't
a realistic one on modern systems with multi-megabyte stacksize limits,
where main memory would be exhausted holding the long strings necessary
to exceed the stacksize limit.

can be thought of as modernizing American
flag sort to take advantage of introsort
as a fallback algorithm. In the author's testing, American
flag sort (on `string_sort`

`std::strings`

) had comparable runtime
to introsort,
but making a hybrid of the two allows reduced overhead and substantially
superior performance.