Boost C++ Libraries of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards


Why spreadsort?

The spreadsort algorithm used in this library is designed to provide best possible worst-case performance, while still being cache-friendly. It provides the better of 𝑶(N*log(K/S + S)) and 𝑶(N*log(N)) worst-case time, where K is the log of the range. The log of the range is normally the length in bits of the data type; 32 for a 32-bit integer.

flash_sort (another hybrid algorithm), by comparison is 𝑶(N) for evenly distributed lists. The problem is, flash_sort is merely an MSD radix sort combined with 𝑶(N*N) insertion sort to deal with small subsets where the MSD Radix Sort is inefficient, so it is inefficient with chunks of data around the size at which it switches to insertion_sort, and ends up operating as an enhanced MSD Radix Sort. For uneven distributions this makes it especially inefficient.

integer_sort and float_sort use introsort instead, which provides 𝑶(N*log(N)) performance for these medium-sized pieces. Also, flash_sort's 𝑶(N) performance for even distributions comes at the cost of cache misses, which on modern architectures are extremely expensive, and in testing on modern systems ends up being slower than cutting up the data in multiple, cache-friendly steps. Also worth noting is that on most modern computers, log2(available RAM)/log2(L1 cache size) is around 3, where a cache miss takes more than 3 times as long as an in-cache random-access, and the size of max_splits is tuned to the size of the cache. On a computer where cache misses aren't this expensive, max_splits could be increased to a large value, or eliminated entirely, and integer_sort/float_sort would have the same 𝑶(N) performance on even distributions.

Adaptive Left Radix (ALR) is similar to flash_sort, but more cache-friendly. It still uses insertion_sort. Because ALR uses 𝑶(N*N) insertion_sort, it isn't efficient to use the comparison-based fallback sort on large lists, and if the data is clustered in small chunks just over the fallback size with a few outliers, radix-based sorting iterates many times doing little sorting with high overhead. Asymptotically, ALR is still 𝑶(N*log(K/S + S)), but with a very small S (about 2 in the worst case), which compares unfavorably with the 11 default value of max_splits for Spreadsort.

ALR also does not have the 𝑶(N*log(N)) fallback, so for small lists that are not evenly distributed it is extremely inefficient. See the alrbreaker and binaryalrbreaker testcases for examples; either replace the call to sort with a call to ALR and update the ALR_THRESHOLD at the top, or as a quick comparison make get_max_count return ALR_THRESHOLD (20 by default based upon the paper). These small tests take 4-10 times as long with ALR as std::sort in the author's testing, depending on the test system, because they are trying to sort a highly uneven distribution. Normal Spreadsort does much better with them, because get_max_count is designed around minimizing worst-case runtime.

burst_sort is an efficient hybrid algorithm for strings that uses substantial additional memory.

string_sort uses minimal additional memory by comparison. Speed comparisons between the two haven't been made, but the better memory efficiency makes string_sort more general.

postal_sort and string_sort are similar. A direct performance comparison would be welcome, but an efficient version of postal_sort was not found in a search for source.

string_sort is most similar to the American flag sort algorithm. The main difference is that it doesn't bother trying to optimize how empty buckets/piles are handled, instead just checking to see if all characters at the current index are equal. Other differences are using std::sort as the fallback algorithm, and a larger fallback size (256 vs. 16), which makes empty pile handling less important.

Another difference is not applying the stack-size restriction. Because of the equality check in string_sort, it would take m*m memory worth of strings to force string_sort to create a stack of depth m. This problem isn't a realistic one on modern systems with multi-megabyte stacksize limits, where main memory would be exhausted holding the long strings necessary to exceed the stacksize limit. string_sort can be thought of as modernizing American flag sort to take advantage of introsort as a fallback algorithm. In the author's testing, American flag sort (on std::strings) had comparable runtime to introsort, but making a hybrid of the two allows reduced overhead and substantially superior performance.