Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

Hybrid Radix

There a two primary types of radix-based sorting:

Most-significant-digit Radix sorting (MSD) divides the data recursively based upon the top-most unsorted bits. This approach is efficient for even distributions that divide nicely, and can be done in-place (limited additional memory used). There is substantial constant overhead for each iteration to deal with the splitting structure. The algorithms provided here use MSD Radix Sort for their radix-sorting portion. The main disadvantage of MSD Radix sorting is that when the data is cut up into small pieces, the overhead for additional recursive calls starts to dominate runtime, and this makes worst-case behavior substantially worse than 𝑶(N*log(N)).

By contrast, integer_sort, float_sort, and string_sort all check to see whether Radix-based or Comparison-based sorting have better worst-case runtime, and make the appropriate recursive call. Because Comparison-based sorting algorithms are efficient on small pieces, the tendency of MSD radix sort to cut the problem up small is turned into an advantage by these hybrid sorts. It is hard to conceive of a common usage case where pure MSD radix sort would have any significant advantage over hybrid algorithms.

Least-significant-digit radix sort (LSD) sorts based upon the least-significant bits first. This requires a complete copy of the data being sorted, using substantial additional memory. The main advantage of LSD Radix Sort is that aside from some constant overhead and the memory allocation, it uses a fixed amount of time per element to sort, regardless of distribution or size of the list. This amount of time is proportional to the length of the radix. The other advantage of LSD Radix Sort is that it is a stable sorting algorithm, so elements with the same key will retain their original order.

One disadvantage is that LSD Radix Sort uses the same amount of time to sort "easy" sorting problems as "hard" sorting problems, and this time spent may end up being greater than an efficient 𝑶(N*log(N)) algorithm such as introsort spends sorting "hard" problems on large data sets, depending on the length of the datatype, and relative speed of comparisons, memory allocation, and random accesses.

The other main disadvantage of LSD Radix Sort is its memory overhead. It's only faster for large data sets, but large data sets use significant memory, which LSD Radix Sort needs to duplicate. LSD Radix Sort doesn't make sense for items of variable length, such as strings; it could be implemented by starting at the end of the longest element, but would be extremely inefficient.

All that said, there are places where LSD Radix Sort is the appropriate and fastest solution, so it would be appropriate to create a templated LSD Radix Sort similar to integer_sort and float_sort. This would be most appropriate in cases where comparisons are extremely slow.


PrevUpHomeNext