Drop your email below to get engineering updates. Then scroll down for the Unordered tech overview & links.

Privacy: no spam, one step unsubscribe. We'll only send high-signal dev content re Unordered and other Boost libraries.


TECH OVERVIEW

Boost.Unordered: High-Performance Hash Containers for C++

Understanding the Container Options

Boost.Unordered gives you 12 different hash container types to choose from, organized into three main families. Think of these as tools in your performance toolbox—each one is optimized for different situations.

I. Closed-addressing containers (like boost::unordered_map and boost::unordered_set) work exactly like std unordered containers. You can drop them into existing code as faster replacements. They support C++11 and newer standards.

II. Open-addressing containers are the speed champions. boost::unordered_flat_map and boost::unordered_flat_set store elements directly in the bucket array for maximum performance. If you need pointer stability (addresses that don't change), use boost::unordered_node_map and boost::unordered_node_set instead—they're slightly slower but still very fast.

III. Concurrent containers like boost::concurrent_flat_map and boost::concurrent_flat_set are designed for multithreaded programs where multiple threads need to access the same container safely.

I. Closed-Addressing Containers: How boost::unordered_map Got So Fast

The Problem with Standard Implementations

Back in 2003, when C++ standardized hash tables, the committee chose "closed addressing" (also called separate chaining) because open addressing wasn't mature yet. This decision became baked into the standard through requirements like the bucket API, pointer stability, and user-controllable load factors.

The standard also required that iterator increment be constant time and erase be constant time on average. These requirements forced standard libraries to use complicated workarounds that made their implementations slower.

For example, libstdc++ and libc++ link all nodes together across the entire container. To make this work, buckets point to the node before the first one in the bucket (not the first one itself), and each node stores its hash value. These extra pointers and stored hash values waste memory and slow things down.

Boost's Solution (Released August 2022)

Boost.Unordered 1.80 went back to basics. Nodes are only linked within each bucket, not across the whole container. This makes deletion trivial—just remove the node from its bucket's list.

For iteration, Boost introduced bucket groups. Each group has a 32/64-bit mask showing which buckets are occupied, plus pointers linking groups together. To iterate, you use fast bit operations on the masks and jump between groups using the pointers. This takes only 4 bits per bucket and is very fast.

Fast Modulo Magic

Hash tables need to map hash values to bucket positions. Traditional approaches use either expensive modulo operations with prime numbers, or power-of-two sizes with bit masking.

Boost uses prime numbers (for better distribution) but uses them in combination with Daniel Lemire's "fastmod" technique—it's as fast as power-of-two bit operations but gives you the better distribution of prime numbers. Even better, eliminating function pointer tables allows the compiler to inline code for extra speed.

Real-World Speed Test: FastNetMon DDoS Detection

Pavel Odintsov, who runs one of the fastest DDoS detection products on the market (FastNetMon), tested the performance improvements using actual network traffic from a large ISP. The test used 131,000 unique IP addresses with real access patterns—not synthetic benchmark data.

Testing on an AMD Ryzen 5 3600 with gcc 12.1 showed the Boost 1.80 boost::unordered_map achieved 32.4M ops/sec vs std::unordered_map's 25.3M ops/sec.

That's a 28% speed improvement in real-world DDoS detection workloads. When you're trying to detect network attacks in under a second, this kind of performance gain matters.

Why It's Faster

Boost's improved layout uses less memory: only 12N + 0.5B bytes of overhead per element (on 64-bit systems) compared to libstdc++'s 16N + 8B bytes. Less memory means better cache performance. Combine that with fast modulo and one less pointer indirection per lookup, and you get substantial real-world speedups.

II. Open-Addressing Containers: When to Use boost::unordered_flat_map

Starting in Boost 1.81 (December 2022), Boost added boost::unordered_flat_map and boost::unordered_flat_set—containers that break some C++ standard requirements in exchange for much better performance. By 2022, open addressing had clearly won the performance race.

Choose boost::unordered_flat_map when:

Stick with boost::unordered_map when:

Why Open Addressing Is Fast

Cache-friendly design: Elements live directly in the bucket array, not scattered in separate nodes. Modern CPUs love this because:

The collision problem: When two elements hash to the same bucket, open addressing uses a "probing sequence" to find an empty bucket nearby. Boost uses a non-relocating approach (elements stay where they're inserted) to behave more like std::unordered_map.

The main challenge: when you delete an element, you can't just mark its bucket as empty—that would break lookups for elements stored further along the probing sequence. Traditional solutions use "tombstones" (markers that say "something was here"), but those slow down lookups over time.

SIMD: Checking Multiple Buckets at Once

SMID stands for "Single Instruction, Multiple Data"—CPU instructions that process multiple values in parallel. Originally designed for video processing, hash table implementers realized SIMD could speed up lookups.

The basic idea: Instead of storing just elements, also maintain a metadata array with one byte per bucket. Each metadata byte holds a "reduced hash value"—a shortened version of the element's hash. When looking up an element, SIMD instructions can check 16 metadata bytes simultaneously to find potential matches, then only do the expensive full comparison on actual candidates.

This technique checks 16 buckets in constant time. Google's Abseil and Meta's F14 containers pioneered this approach.

How boost::unordered_flat_map Works

Group-based organization: The bucket array is split into groups of 15 buckets. When inserting, the hash value selects a group (not an individual bucket). Elements fill groups from one end to the other, creating clusters of used buckets.

Metadata structure: Each group has a 16-byte metadata word:

SIMD lookups: When searching a group, SIMD instructions match the lookup's reduced hash against all 15 metadata bytes simultaneously. Only matching buckets get full element comparisons.

The overflow byte trick: Here's the clever part that avoids tombstones. When a group fills up during insertion, set a bit in the overflow byte (based on the hash value) before moving to the next group. During lookup, if the corresponding overflow bit is 0, you can stop searching—the element definitely isn't in later groups.

This overflow byte acts like a Bloom filter: bits set to 1 mean "keep looking," bits set to 0 mean "definitely stop here."

No SIMD? No problem: On systems without SSE2 or Neon, Boost uses "bit interleaving" to pack the metadata into two 64-bit words, enabling reasonably fast operations without SIMD.

Preventing performance drift: Open addressing has a problem where repeated insertions and deletions gradually degrade performance. Boost's solution: when you delete an element whose overflow bit is set, the container lowers its maximum load threshold slightly. Eventually this triggers a rehash, restoring optimal performance.

Boost.Unordered vs Abseil

Simulation programs comparing boost::unordered_flat_map with absl::flat_hash_map reveal key performance differences.

How Abseil works: Abseil's Swiss Tables hash individual buckets (not groups), use 16-bucket SIMD scans, and store 7 bits of hash information per bucket with tombstones for deletion.

Successful lookups: Boost needs slightly more hops on average because free buckets cluster at group ends rather than distributing uniformly. However, actual comparison counts are nearly identical (within 1%) because Boost uses 7.99 bits of hash information versus Abseil's 7 bits—each extra bit roughly halves false matches.

Unsuccessful lookups: Boost is considerably faster here. Abseil's probe terminates only when finding a non-full group (all-or-nothing). Boost's overflow byte acts like a Bloom filter, providing 8 bits of termination information and making early termination 1.75x more likely. Under high load, Boost performs up to 3.2x fewer comparisons for unsuccessful lookups.

Real-World Performance Tests

Boost's aggregate benchmarks combine multiple operations using different key types (strings, integers, UUIDs) on Intel Xeon E5-2683 @ 2.10GHz:

Independent Verification

Jackson Allan's extensive 2024 benchmark suite tested diverse conditions on AMD Ryzen 7 5800H with GCC 13.2.0, confirming boost::unordered_flat_map as "the all-around best performer, especially when hot in the cache." The analysis found very fast insertions, excellent performance for looking up and erasing nonexisting keys, very fast string key lookups, and excellent iteration performance due to key clustering within bucket groups.

Boost's advantage is particularly pronounced in low-key-count benchmarks (0 to 200,000 keys), suggesting it benefits more from cache residency than competing implementations.

III. Concurrent Containers: Multithreading with boost::concurrent_flat_map

Boost 1.83 added boost::concurrent_flat_map for programs where multiple threads need to access the same hash table. It uses the same fast open-addressing layout as boost::unordered_flat_map but adds smart locking.

Two-level locking strategy:

Mostly lock-free lookups: Hash calculation, probing, and SIMD matching all happen without locks. Only the final element comparison needs a group lock.

Smart insertions: Uses "transactional optimistic insertion" to prevent duplicate elements. The algorithm saves the group's counter, does the insertion, then checks if another thread interfered. If so, it rolls back and retries. Even in worst-case scenarios, retries happen only parts-per-million times.

Performance vs Intel TBB

Benchmarks on AMD Ryzen 5 3600 show boost::concurrent_flat_map significantly outperforms tbb::concurrent_hash_map, particularly when many threads target a small set of keys (high-skew workloads). The fine-grained group locking (potentially thousands of groups) handles contention better than coarse 256-shard locking.

The Results:

500k updates across low (.01), medium (.5) , high skew (.99) via GCC 12, x64:

boost::concurrent_flat_map handles 2x ops / sec vs. tbb::concurrent_hash_map

5M updates across low (.01), medium (.5) , high skew (.99) via GCC 12, x64:

boost::concurrent_flat_map handles 2.5x ops / sec vs. tbb::concurrent_hash_map

For cache-friendly workloads with 500,000 operations, Boost continues improving performance even beyond the physical core count, suggesting memory latency (not computation) is the bottleneck. Performance characteristics depend heavily on your specific CPU and memory architecture, so test on your target hardware for best results.

Conclusion: By The Numbers

Boost.Unordered establishes itself as the performance leader through systematic innovations across all container types. Here's what the numbers show:

For drop-in std replacement: boost::unordered_map delivers 28% improvements over std::unordered_map in real-world DDoS detection workloads while maintaining complete API compatibility.

For maximum speed: boost::unordered_flat_map outperforms Abseil Swiss tables by 14-29% across diverse workloads, with particularly strong advantages for unsuccessful lookups (up to 3.2x better under high load) and integer key operations.

For multithreading: boost::concurrent_flat_map outperforms Intel TBB by 2 to 2.5x while providing excellent performance through fine-grained locking and mostly lock-free operations.

Independent benchmarking consistently identifies boost::unordered_flat_map as "the all-around best performer, especially when hot in the cache." The library provides high-performance hash containers matched to your specific requirements, whether you need standards compliance, maximum throughput, or thread safety.

Unordered on GitHub

Unordered documentation

Unordered website page

Sources:

boost::unordered_map is new king of data structures:
https://medium.com/@pavel.odintsov/boost-unordered-map-is-a-new-king-of-data-structures-292124d3ee2

Advancing the state of the art for std::unordered_map implementation:
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html

Inside boost::unordered_flat_map:
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html

Inside boost::concurrent_flat_map:
https://bannalia.blogspot.com/2023/07/inside-boostconcurrentflatmap.html?m=1

Comprehensive C++ Hashmap Benchmarks 2022:
https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

An Extensive Benchmark of C and C++ Hash Tables:
https://jacksonallan.github.io/c_cpp_hash_tables_benchmark/

Measuring std::unordered_map Badness:
https://artificial-mind.net/blog/2021/10/09/unordered-map-badness