Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Click here to view the latest version of this page.
PrevUpHomeNext

Advanced lookup and insertion functions for associative containers

Advanced lookups
Advanced insertions
Positional insertions

Boost.Intrusive associative containers offer the same interface as STL associative containers. However, STL and TR1 ordered and unordered simple associative containers (std::set, std::multiset, std::tr1::unordered_set and std::tr1::unordered_multiset) have some inefficiencies caused by the interface: the user can only operate with value_type objects. When using these containers we must use iterator find(const value_type &value) to find a value. The same happens in other functions like equal_range, lower_bound, upper_bound, etc.

However, sometimes the object to be searched is quite expensive to construct:

#include <boost/intrusive/set.hpp>
#include <boost/intrusive/unordered_set.hpp>
#include <cstring>

using namespace boost::intrusive;

// Hash function for strings
struct StrHasher
{
   std::size_t operator()(const char *str) const
   {  
      std::size_t seed = 0;
      for(; *str; ++str)   boost::hash_combine(seed, *str);
      return seed;
   }
};

class Expensive : public set_base_hook<>, public unordered_set_base_hook<>
{
   std::string key_;
   // Other members...
  
   public:
   Expensive(const char *key)
      :  key_(key)
      {}  //other expensive initializations...

   const std::string & get_key() const
      {  return key_;   }

   friend bool operator <  (const Expensive &a, const Expensive &b)
      {  return a.key_ < b.key_;  }

   friend bool operator == (const Expensive &a, const Expensive &b)
      {  return a.key_ == b.key_;  }

   friend std::size_t hash_value(const Expensive &object)
      {  return StrHasher()(object.get_key().c_str());  }
};

// A set and unordered_set that store Expensive objects
typedef set<Expensive>           Set;
typedef unordered_set<Expensive> UnorderedSet;

// Search functions
Expensive *get_from_set(const char* key, Set &set_object)
{
   Set::iterator it = set_object.find(Expensive(key));
   if( it == set_object.end() )     return 0;
   return &*it;
}

Expensive *get_from_uset(const char* key, UnorderedSet &uset_object)
{
   UnorderedSet::iterator it = uset_object.find(Expensive (key));
   if( it == uset_object.end() )  return 0;
   return &*it;
}

Expensive is an expensive object to construct. If "key" c-string is quite long Expensive has to construct a std::string using heap memory. Like Expensive, many times the only member taking part in ordering issues is just a small part of the class. For example, with Expensive, only the internal std::string is needed to compare the object.

In both containers, if we call get_from_set/get_from_unordered_set in a loop, we might get a performance penalty, because we are forced to create a whole Expensive object to be able to find an equivalent one.

Sometimes this interface limitation is severe, because we might not have enough information to construct the object but we might have enough information to find the object. In this case, a name is enough to search Expensive in the container but constructing an Expensive might require more information that the user might not have.

To solve this, set/multiset offer alternative functions, which take any type comparable with the value and a functor that should be compatible with the ordering function of the associative container. unordered_set/unordered_multiset offers functions that take any key type and compatible hash and equality functions. Now, let's see the optimized search function:

// These compare Expensive and a c-string
struct StrExpComp
{
   bool operator()(const char *str, const Expensive &c) const
   {  return std::strcmp(str, c.get_key().c_str()) < 0;  }

   bool operator()(const Expensive &c, const char *str) const
   {  return std::strcmp(c.get_key().c_str(), str) < 0;  }
};

struct StrExpEqual
{
   bool operator()(const char *str, const Expensive &c) const
   {  return std::strcmp(str, c.get_key().c_str()) == 0;  }

   bool operator()(const Expensive &c, const char *str) const
   {  return std::strcmp(c.get_key().c_str(), str) == 0;  }
};

// Optimized search functions
Expensive *get_from_set_optimized(const char* key, Set &set_object)
{
   Set::iterator it = set_object.find(key, StrExpComp());
   if( it == set_object.end() )   return 0;
   return &*it;
}

Expensive *get_from_uset_optimized(const char* key, UnorderedSet &uset_object)
{
   UnorderedSet::iterator it = uset_object.find(key, StrHasher(), StrExpEqual());
   if( it == uset_object.end() )  return 0;
   return &*it;
}

This new arbitrary key overload is also available for other functions taking values as arguments:

  • equal_range
  • lower_bound
  • upper_bound
  • count
  • find
  • erase

Check set, multiset, unordered_set, unordered_multiset references to know more about those functions.

A similar issue happens with insertions in simple ordered and unordered associative containers with unique keys (std::set and std::tr1::unordered_set). In these containers, if a value is already present, the value to be inserted is discarded. With expensive values, if the value is already present, we can suffer efficiency problems.

set and unordered_set have insertion functions to check efficiently, without constructing the value, if a value is present or not and if it's not present, a function to insert it immediately without any further lookup. For example, using the same Expensive class, this function can be inefficient:

// Insertion functions
bool insert_to_set(const char* key, Set &set_object)
{
   Expensive *pobject = new Expensive(key);
   bool success = set_object.insert(*pobject).second;
   if(!success)   delete pobject;
   return success;
}

bool insert_to_uset(const char* key, UnorderedSet &uset_object)
{
   Expensive *pobject = new Expensive(key);
   bool success = uset_object.insert(*pobject).second;
   if(!success)   delete pobject;
   return success;
}

If the object is already present, we are constructing an Expensive that will be discarded, and this is a waste of resources. Instead of that, let's use insert_check and insert_commit functions:

// Optimized insertion functions
bool insert_to_set_optimized(const char* key, Set &set_object)
{
   Set::insert_commit_data insert_data;
   bool success = set_object.insert_check(key, StrExpComp(), insert_data).second;
   if(success) set_object.insert_commit(*new Expensive(key), insert_data);
   return success;
}

bool insert_to_uset_optimized(const char* key, UnorderedSet &uset_object)
{
   UnorderedSet::insert_commit_data insert_data;
   bool success = uset_object.insert_check
      (key, StrHasher(), StrExpEqual(), insert_data).second;
   if(success) uset_object.insert_commit(*new Expensive(key), insert_data);
   return success;
}

insert_check is similar to a normal insert but:

  • insert_check can be used with arbitrary keys
  • if the insertion is possible (there is no equivalent value) insert_check collects all the needed information in an insert_commit_data structure, so that insert_commit:
    • does not execute further comparisons
    • can be executed with constant-time complexity
    • has no-throw guarantee.

These functions must be used with care, since no other insertion or erasure must be executed between an insert_check and an insert_commit pair. Otherwise, the behaviour is undefined. insert_check and insert_commit will come in handy for developers programming efficient non-intrusive associative containers. See set and unordered_set reference for more information about insert_check and insert_commit.

With multiple ordered and unordered associative containers (multiset and unordered_multiset) there is no need for these advanced insertion functions, since insertions are always successful.

Some ordered associative containers offer low-level functions to bypass ordering checks and insert nodes directly in desired tree positions. These functions are provided for performance reasons when values to be inserted in the container are known to fulfill order (sets and multisets) and uniqueness (sets) invariants. A typical usage of these functions is when intrusive associative containers are used to build non-intrusive containers and the programmer wants to speed up assignments from other associative containers: if the ordering and uniqueness properties are the same, there is no need to waste time checking the position of each source value, because values are already ordered: back insertions will be much more efficient.

Note: These functions don't check preconditions so they must used with care. These are functions are low-level operations will break container invariants if ordering and uniqueness preconditions are not assured by the caller.

Let's see an example:

#include <boost/intrusive/set.hpp>
#include <vector>
#include <algorithm>
#include <cassert>

using namespace boost::intrusive;

//A simple class with a set hook
class MyClass : public set_base_hook<>
{
   public:
   int int_;

   MyClass(int i) :  int_(i) {}
   friend bool operator< (const MyClass &a, const MyClass &b)
      {  return a.int_ < b.int_;  }
   friend bool operator> (const MyClass &a, const MyClass &b)
      {  return a.int_ > b.int_;  }
};

int main()
{
   //Create some ORDERED elements
   std::vector<MyClass> values;
   for(int i = 0; i < 100; ++i)  values.push_back(MyClass(i));

   {  //Data is naturally ordered in the vector with the same criteria
      //as multiset's comparison predicate, so we can just push back
      //all elements, which is more efficient than normal insertion
      multiset<MyClass> mset;
      for(int i = 0; i < 100; ++i)  mset.push_back(values[i]);

      //Now check orderd invariant
      multiset<MyClass>::const_iterator next(mset.cbegin()), it(next++);
      for(int i = 0; i < 99; ++i, ++it, ++next) assert(*it < *next);
   }
   {  //Now the correct order for the set is the reverse order
      //so let's push front all elements
      multiset<MyClass, compare< std::greater<MyClass> > > mset;
      for(int i = 0; i < 100; ++i)  mset.push_front(values[i]);

      //Now check orderd invariant
      multiset<MyClass, compare< std::greater<MyClass> > >::
         const_iterator next(mset.cbegin()), it(next++);
      for(int i = 0; i < 99; ++i, ++it, ++next) assert(*it > *next);
   }
   {  //Now push the first and the last and insert the rest
      //before the last position using "insert_before"
      multiset<MyClass> mset;
      mset.insert_before(mset.begin(), values[0]);
      multiset<MyClass>::const_iterator pos =
         mset.insert_before(mset.end(), values[99]);
      for(int i = 1; i < 99; ++i)  mset.insert_before(pos, values[i]);

      //Now check orderd invariant
      multiset<MyClass>::const_iterator next(mset.cbegin()), it(next++);
      for(int i = 0; i < 99; ++i, ++it, ++next) assert(*it < *next);
   }

   return 0;
}

For more information about advanced lookup and insertion functions see associative containers' documentation (e.g. set, multiset, unordered_set and unordered_multiset references).


PrevUpHomeNext