Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

Semi-Intrusive unordered associative containers: unordered_set, unordered_multiset

unordered_set and unordered_multiset performance notes
unordered_set and unordered_multiset hooks
unordered_set and unordered_multiset containers
Example
Custom bucket traits

Boost.Intrusive also offers hashed containers that can be very useful to implement fast-lookup containers. These containers (unordered_set and unordered_multiset) are semi-intrusive containers: they need additional memory apart from the hook stored in the value_type. This additional memory must be passed in the constructor of the container.

Unlike C++ TR1 unordered associative containers (which are also hashed containers), the contents of these semi-intrusive containers are not rehashed to maintain a load factor: that would require memory management and intrusive containers don't implement any memory management at all. However, the user can request an explicit rehashing passing a new bucket array. This also offers an additional guarantee over TR1 unordered associative containers: iterators are not invalidated when inserting an element in the container.

As with TR1 unordered associative containers, rehashing invalidates iterators, changes ordering between elements and changes which buckets elements appear in, but does not invalidate pointers or references to elements.

Apart from expected hash and equality function objects, Boost.Intrusive unordered associative containers' constructors take an argument specifying an auxiliary bucket vector to be used by the container.

The size overhead for a hashed container is moderate: 1 pointer per value plus a bucket array per container. The size of an element of the bucket array is usually one pointer. To obtain a good performance hashed container, the bucket length is usually the same as the number of elements that the container contains, so a well-balanced hashed container (bucket_count() is equal to size() ) will have an equivalent overhead of two pointers per element.

An empty, non constant-time size unordered_set or unordered_multiset has also the size of bucket_count() pointers.

Insertions, erasures, and searches, have amortized constant-time complexity in hashed containers. However, some worst-case guarantees are linear. See unordered_set or unordered_multiset for complexity guarantees of each operation.

Be careful with non constant-time size hashed containers: some operations, like empty(), have linear complexity, unlike other Boost.Intrusive containers.

unordered_set and unordered_multiset share the same hooks. This is an advantage, because the same user type can be inserted first in a unordered_multiset and after that in unordered_set without changing the definition of the user class.

template <class ...Options>
class unordered_set_base_hook;
template <class ...Options>
class unordered_set_member_hook;

unordered_set_base_hook and unordered_set_member_hook receive the same options explained in the section How to use Boost.Intrusive:

  • tag<class Tag> (for base hooks only): This argument serves as a tag, so you can derive from more than one base hook. Default: tag<default_tag>.
  • link_mode<link_mode_type LinkMode>: The linking policy. Default: link_mode<safe_link>.
  • void_pointer<class VoidPointer>: The pointer type to be used internally in the hook and propagated to the container. Default: void_pointer<void*>.

Apart from them, these hooks offer additional options:

  • store_hash<bool Enabled>: This option reserves additional space in the hook to store the hash value of the object once it's introduced in the container. When this option is used, the unordered container will store the calculated hash value in the hook and rehashing operations won't need to recalculate the hash of the value. This option will improve the performance of unordered containers when rehashing is frequent or hashing the value is a slow operation. Default: store_hash<false>.
  • optimize_multikey<bool Enabled>: This option reserves additional space in the hook that will be used to group equal elements in unordered multisets, improving significantly the performance when many equal values are inserted in these containers. Default: optimize_multikey<false>.
template<class T, class ...Options>
class unordered_set;

template<class T, class ...Options>
class unordered_multiset;

As mentioned, unordered containers need an auxiliary array to work. Boost.Intrusive unordered containers receive this auxiliary array packed in a type called bucket_traits (which can be also customized by a container option). All unordered containers receive a bucket_traits object in their constructors. The default bucket_traits class is initialized with a pointer to an array of buckets and its size:

#include <boost/intrusive/unordered_set.hpp>

using namespace boost::intrusive;

struct MyClass : public unordered_set_base_hook<>
{};

typedef unordered_set<MyClass>::bucket_type     bucket_type;
typedef unordered_set<MyClass>::bucket_traits   bucket_traits;

int main()
{
   bucket_type buckets[100];
   unordered_set<MyClass> uset(bucket_traits(buckets, 100));
   return 0;
}

Each hashed container needs its own bucket traits, that is, its own bucket vector. Two hashed containers can't share the same bucket_type elements. The bucket array must be destroyed after the container using it is destroyed, otherwise, the result is undefined.

unordered_set and unordered_multiset receive the same options explained in the section How to use Boost.Intrusive:

  • base_hook<class Hook> / member_hook<class T, class Hook, Hook T::* PtrToMember> / value_traits<class ValueTraits>: To specify the hook type or value traits used to configure the container. (To learn about value traits go to the section Containers with custom ValueTraits.)
  • constant_time_size<bool Enabled>: To activate the constant-time size() operation. Default: constant_time_size<true>
  • size_type<typename SizeType>: To specify the type that will be used to store the size of the container. Default: size_type<std::size_t>

And they also can receive additional options:

  • equal<class Equal>: Equality function for the objects to be inserted in containers. Default: equal< std::equal_to<T> >
  • hash<class Hash>: Hash function to be used in the container. Default: internal hash functor that hashes scalar types, triggering unqualified call to hash_valuefor non-scalar types (boost::hash<T> compatible protocol). If the ADL call does not find a viable candidate a boost::hash<T>() call is tried in case the user has included boost/container_hash/hash.hpp and a viable candidate is found there. Fails otherwise.
  • bucket_traits<class BucketTraits>: A type that wraps the bucket vector to be used by the unordered container. Default: a type initialized by the address and size of a bucket array and stores both variables internally.
  • power_2_buckets<bool Enabled>: The user guarantees that only bucket arrays with power of two length will be used. The container will then use masks instead of modulo operations to obtain the bucket number from the hash value. Masks are faster than modulo operations and for some applications modulo operations can impose a considerable overhead. In debug mode an assertion will be raised if the user provides a bucket length that is not power of two. Default: power_2_buckets<false>.
  • cache_begin<bool Enabled>: Note: this option is not compatible with auto_unlink hooks. Due to its internal structure, finding the first element of an unordered container (begin() operation) is amortized constant-time. It's possible to speed up begin() and other operations related to it (like clear()) if the container caches internally the position of the first element. This imposes the overhead of one pointer to the size of the container. Default: cache_begin<false>.
  • compare_hash<bool Enabled>: Note: this option requires store_hash<true> option in the hook. When the comparison function is expensive, (e.g. strings with a long common predicate) sometimes (specially when the load factor is high or we have many equivalent elements in an unordered_multiset and no optimize_multikey<> is activated in the hook) the equality function is a performance problem. Two equal values must have equal hashes, so comparing the hash values of two elements before using the comparison functor can speed up some implementations.
  • incremental<bool Enabled>: Activates incremental hashing (also known as Linear Hashing). This option implies power_2_buckets<true> and the container will require power of two buckets. For more information on incremental hashing, see Linear hash on Wikipedia Default: incremental<false>
  • key_of_value<class KeyOfValueFunctionObject>: A function object that will define the key_type of the value type to be stored. This type will allow a map-like interface. See Map and multimap-like interface with set and multiset for details. Default: key_type is equal to value_type (set-like interface).

Now let's see a small example using both hooks and both containers:

#include <boost/intrusive/unordered_set.hpp>
#include <vector>
#include <functional>

using namespace boost::intrusive;

class MyClass : public unordered_set_base_hook<>
{               //This is a derivation hook
   int int_;

   public:
   unordered_set_member_hook<> member_hook_; //This is a member hook

   MyClass(int i)
      :  int_(i)
   {}

   friend bool operator== (const MyClass &a, const MyClass &b)
   {  return a.int_ == b.int_;  }

   friend std::size_t hash_value(const MyClass &value)
   {  return std::size_t(value.int_); }
};

//Define an unordered_set that will store MyClass objects using the base hook
typedef unordered_set<MyClass>    BaseSet;

//Define an unordered_multiset that will store MyClass using the member hook
typedef member_hook<MyClass, unordered_set_member_hook<>, &MyClass::member_hook_>
   MemberOption;
typedef unordered_multiset< MyClass, MemberOption>  MemberMultiSet;

int main()
{
   typedef std::vector<MyClass>::iterator VectIt;

   //Create a vector with 100 different MyClass objects
   std::vector<MyClass> values;
   for(int i = 0; i < 100; ++i)  values.push_back(MyClass(i));

   //Create a copy of the vector
   std::vector<MyClass> values2(values);

   //Create a bucket array for base_set
   BaseSet::bucket_type base_buckets[100];

   //Create a bucket array for member_multi_set
   MemberMultiSet::bucket_type member_buckets[200];

   //Create unordered containers taking buckets as arguments
   BaseSet base_set(BaseSet::bucket_traits(base_buckets, 100));
   MemberMultiSet member_multi_set
      (MemberMultiSet::bucket_traits(member_buckets, 200));

   //Now insert values's elements in the unordered_set
   for(VectIt it(values.begin()), itend(values.end()); it != itend; ++it)
      base_set.insert(*it);

   //Now insert values's and values2's elements in the unordered_multiset
   for(VectIt it(values.begin()), itend(values.end()), it2(values2.begin())
      ; it != itend; ++it, ++it2){
      member_multi_set.insert(*it);
      member_multi_set.insert(*it2);
   }

   //Now find every element
   {
      VectIt it(values.begin()), itend(values.end());

      for(; it != itend; ++it){
         //base_set should contain one element for each key
         if(base_set.count(*it) != 1)           return 1;
         //member_multi_set should contain two elements for each key
         if(member_multi_set.count(*it) != 2)   return 1;
      }
   }
   return 0;
}

Instead of using the default bucket_traits class to store the bucket array, a user can define his own class to store the bucket array using the bucket_traits<> option. A user-defined bucket-traits must fulfill the following interface:

class my_bucket_traits
{
   bucket_ptr        bucket_begin();
   const_bucket_ptr  bucket_begin() const;
   std::size_t bucket_count() const;
};

The following bucket traits just stores a pointer to the bucket array but the size is a compile-time constant. Note the use of the auxiliary unordered_bucket and unordered_bucket_ptr utilities to obtain the type of the bucket and its pointer before defining the unordered container:

#include <boost/intrusive/unordered_set.hpp>
#include <vector>

using namespace boost::intrusive;

//A class to be inserted in an unordered_set
class MyClass : public unordered_set_base_hook<>
{
   int int_;

   public:
   MyClass(int i = 0) : int_(i)
   {}

   friend bool operator==(const MyClass &l, const MyClass &r)
      {  return l.int_ == r.int_;   }
   friend std::size_t hash_value(const MyClass &v)
   {  //Use your favorite hash function, like boost::hash or std::hash
      return std::size_t(v.int_);
   }
};

//Define the base hook option
typedef base_hook< unordered_set_base_hook<> >     BaseHookOption;

//Obtain the types of the bucket and the bucket pointer
typedef unordered_bucket<BaseHookOption>::type     BucketType;
typedef unordered_bucket_ptr<BaseHookOption>::type BucketPtr;

//The custom bucket traits.
class custom_bucket_traits
{
   public:
   static const int NumBuckets = 100;

   custom_bucket_traits(BucketPtr buckets)
      :  buckets_(buckets)
   {}

   //Functions to be implemented by custom bucket traits
   BucketPtr   bucket_begin() const {  return buckets_;  }
   std::size_t bucket_count() const {  return NumBuckets;}

   private:
   BucketPtr buckets_;
};

//Define the container using the custom bucket traits
typedef unordered_set<MyClass, bucket_traits<custom_bucket_traits> > BucketTraitsUset;

int main()
{
   typedef std::vector<MyClass>::iterator VectIt;
   std::vector<MyClass> values;

   //Fill values
   for(int i = 0; i < 100; ++i)  values.push_back(MyClass(i));

   //Now create the bucket array and the custom bucket traits object
   BucketType buckets[custom_bucket_traits::NumBuckets];
   custom_bucket_traits btraits(buckets);

   //Now create the unordered set
   BucketTraitsUset uset(btraits);

   //Insert the values in the unordered set
   for(VectIt it(values.begin()), itend(values.end()); it != itend; ++it)
      uset.insert(*it);

   return 0;
}

PrevUpHomeNext