...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
On average case data where no patterns are detected
is effectively
a quicksort that uses median-of-3 pivot selection, switching to insertion
sort if the number of elements to be (recursively) sorted is small. The
overhead associated with detecting the patterns for the best case is so
small it lies within the error of measurement.
pdqsort
gets a great speedup over the traditional way of implementing quicksort
when sorting large arrays (1000+ elements). This is due to a new technique
described in "BlockQuicksort: How Branch Mispredictions don't affect
Quicksort" by Stefan Edelkamp and Armin Weiss. In short, we bypass
the branch predictor by using small buffers (entirely in L1 cache) of the
indices of elements that need to be swapped. We fill these buffers in a
branch-free way that's quite elegant (in pseudocode):
pdqsort
buffer_num = 0; buffer_max_size = 64; for (int i = 0; i < buffer_max_size; ++i) { // With branch: if (elements[i] < pivot) { buffer[buffer_num] = i; buffer_num++; } // Without: buffer[buffer_num] = i; buffer_num += (elements[i] < pivot); }