...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Boost.Intrusive associative containers offer
the same interface as STL associative containers. However, STL and TR1 ordered
and unordered simple associative containers (std::set
,
std::multiset
, std::tr1::unordered_set
and std::tr1::unordered_multiset
) have some inefficiencies
caused by the interface: the user can only operate with value_type
objects. When using these containers we must use iterator
find(const value_type
&value)
to find a value. The same happens in other
functions like equal_range
,
lower_bound
, upper_bound
, etc.
However, sometimes the object to be searched is quite expensive to construct:
#include <boost/intrusive/set.hpp> #include <boost/intrusive/unordered_set.hpp> #include <cstring> using namespace boost::intrusive; // Hash function for strings struct StrHasher { std::size_t operator()(const char *str) const { std::size_t seed = 0; for(; *str; ++str) boost::hash_combine(seed, *str); return seed; } }; class Expensive : public set_base_hook<>, public unordered_set_base_hook<> { std::string key_; // Other members... public: Expensive(const char *key) : key_(key) {} //other expensive initializations... const std::string & get_key() const { return key_; } friend bool operator < (const Expensive &a, const Expensive &b) { return a.key_ < b.key_; } friend bool operator == (const Expensive &a, const Expensive &b) { return a.key_ == b.key_; } friend std::size_t hash_value(const Expensive &object) { return StrHasher()(object.get_key().c_str()); } }; // A set and unordered_set that store Expensive objects typedef set<Expensive> Set; typedef unordered_set<Expensive> UnorderedSet; // Search functions Expensive *get_from_set(const char* key, Set &set_object) { Set::iterator it = set_object.find(Expensive(key)); if( it == set_object.end() ) return 0; return &*it; } Expensive *get_from_uset(const char* key, UnorderedSet &uset_object) { UnorderedSet::iterator it = uset_object.find(Expensive (key)); if( it == uset_object.end() ) return 0; return &*it; }
Expensive
is an expensive
object to construct. If "key" c-string is quite long Expensive
has to construct a std::string
using heap memory. Like Expensive
,
many times the only member taking part in ordering issues is just a small
part of the class. For example, with Expensive
,
only the internal std::string
is needed to compare the object.
In both containers, if we call get_from_set/get_from_unordered_set
in a loop, we might get a performance penalty, because we are forced to create
a whole Expensive
object
to be able to find an equivalent one.
Sometimes this interface limitation is severe, because we might
not have enough information to construct the object but we might
have enough information to find the object.
In this case, a name is enough to search Expensive
in the container but constructing an Expensive
might require more information that the user might not have.
To solve this, set
/multiset
offer alternative functions,
which take any type comparable with the value and a functor that should be
compatible with the ordering function of the associative container. unordered_set
/unordered_multiset
offers functions that take any key type and compatible hash and equality
functions. Now, let's see the optimized search function:
// These compare Expensive and a c-string struct StrExpComp { bool operator()(const char *str, const Expensive &c) const { return std::strcmp(str, c.get_key().c_str()) < 0; } bool operator()(const Expensive &c, const char *str) const { return std::strcmp(c.get_key().c_str(), str) < 0; } }; struct StrExpEqual { bool operator()(const char *str, const Expensive &c) const { return std::strcmp(str, c.get_key().c_str()) == 0; } bool operator()(const Expensive &c, const char *str) const { return std::strcmp(c.get_key().c_str(), str) == 0; } }; // Optimized search functions Expensive *get_from_set_optimized(const char* key, Set &set_object) { Set::iterator it = set_object.find(key, StrExpComp()); if( it == set_object.end() ) return 0; return &*it; } Expensive *get_from_uset_optimized(const char* key, UnorderedSet &uset_object) { UnorderedSet::iterator it = uset_object.find(key, StrHasher(), StrExpEqual()); if( it == uset_object.end() ) return 0; return &*it; }
This new arbitrary key overload is also available for other functions taking values as arguments:
Check set
, multiset
, unordered_set
,
unordered_multiset
references to know more about those functions.
A similar issue happens with insertions in simple ordered and unordered associative
containers with unique keys (std::set
and
std::tr1::unordered_set
).
In these containers, if a value is already present, the value to be inserted
is discarded. With expensive values, if the value is already present, we
can suffer efficiency problems.
set
and unordered_set
have insertion functions to check efficiently, without constructing the value,
if a value is present or not and if it's not present, a function to insert
it immediately without any further lookup. For example, using the same Expensive
class, this function can be inefficient:
// Insertion functions bool insert_to_set(const char* key, Set &set_object) { Expensive *pobject = new Expensive(key); bool success = set_object.insert(*pobject).second; if(!success) delete pobject; return success; } bool insert_to_uset(const char* key, UnorderedSet &uset_object) { Expensive *pobject = new Expensive(key); bool success = uset_object.insert(*pobject).second; if(!success) delete pobject; return success; }
If the object is already present, we are constructing an Expensive
that will be discarded, and this is a waste of resources. Instead of that,
let's use insert_check
and
insert_commit
functions:
// Optimized insertion functions bool insert_to_set_optimized(const char* key, Set &set_object) { Set::insert_commit_data insert_data; bool success = set_object.insert_check(key, StrExpComp(), insert_data).second; if(success) set_object.insert_commit(*new Expensive(key), insert_data); return success; } bool insert_to_uset_optimized(const char* key, UnorderedSet &uset_object) { UnorderedSet::insert_commit_data insert_data; bool success = uset_object.insert_check (key, StrHasher(), StrExpEqual(), insert_data).second; if(success) uset_object.insert_commit(*new Expensive(key), insert_data); return success; }
insert_check
is similar to
a normal insert
but:
insert_check
can be used
with arbitrary keys
insert_check
collects all the needed
information in an insert_commit_data
structure, so that insert_commit
:
These functions must be used with care, since no other insertion or erasure
must be executed between an insert_check
and an insert_commit
pair.
Otherwise, the behaviour is undefined. insert_check
and insert_commit
will come
in handy for developers programming efficient non-intrusive associative containers.
See set
and unordered_set
reference
for more information about insert_check
and insert_commit
.
With multiple ordered and unordered associative containers (multiset
and unordered_multiset
)
there is no need for these advanced insertion functions, since insertions
are always succesful.
For more information about advanced lookup and insertion functions see set
, multiset
,
unordered_set
and unordered_multiset
references.