Introduction
boost::hash
is an enhanced implementation of the
hash function object specified by
C++11 as std::hash
. It is the default hash function for
Boost.Unordered,
Boost.Intrusive's unordered associative
containers, Boost.MultiIndex's hash
indices, and Boost.Bimap's unordered_set_of
.
Out of the box, boost::hash
supports
-
standard integral types (integers, character types, and
bool
); -
standard floating point types (
float
,double
,long double
); -
pointers (to objects and to functions, but not pointers to members);
-
enumeration types;
-
C arrays;
-
std::complex
; -
std::pair
,std::tuple
; -
sequence-like types, both standard and user-defined (sequence-like types have
begin()
andend()
member functions returning iterators); -
unordered sequences, standard or user-defined (sequences for which the hash value does not depend on the element order, such as
std::unordered_set
andstd::unordered_map
); -
described structs and classes — ones that have been annotated with the
BOOST_DESCRIBE_STRUCT
orBOOST_DESCRIBE_CLASS
macros from Boost.Describe; -
std::unique_ptr
,std::shared_ptr
; -
std::type_index
; -
std::error_code
,std::error_condition
; -
std::optional
; -
std::variant
,std::monostate
.
boost::hash
is extensible; it’s possible for a user-defined type X
to make
iself hashable via boost::hash<X>
by defining an appropriate overload of the
function hash_value
. Many, if not most, Boost types already contain the
necessary support.
boost::hash
meets the requirements for std::hash
specified in the C++11
standard, namely, that for two different input values their corresponding hash
values are either guaranteed to be distinct, or the probability of their being
the same (a hash collision) is small. Standard unordered containers, and the
hash-based Boost containers, are designed to work well with such hash functions.
boost::hash
does not meet the stronger requirements often placed on hash
functions in a more general context. In particular, the hash function is not
cryptographic, is not collision-resistant against a determined adversary, and
does not necessarily possess good "avalanche" properties; that is, small
(single bit) perturbations in the input do not necessarily result in large
(half bits changing) perturbations in the output.
In particular, boost::hash
has traditionally been the identity function for
all integral types that fit into std::size_t
, because this guarantees lack of
collisions and is as fast as possible.
Recent Changes
Boost 1.81.0
Major update.
-
The specializations of
boost::hash
have been removed; it now always callshash_value
. -
Support for
BOOST_HASH_NO_EXTENSIONS
has been removed. The extensions are always enabled. -
All standard containers are now supported. This includes
std::forward_list
and the unordered associative containers. -
User-defined containers (types that have
begin()
andend()
member functions that return iterators) are now supported out of the box. -
Described structs and classes (those annotated with
BOOST_DESCRIBE_STRUCT
orBOOST_DESCRIBE_CLASS
) are now supported out of the box. -
hash_combine
has been improved. -
The performance (and quality, as a result of the above change) of string hashing has been improved.
boost::hash
for strings now passes SMHasher in 64 bit mode. -
The documentation has been substantially revised to reflect the changes.
Tutorial
When using a Boost container such as
Boost.Unordered, you don’t need to do
anything to use boost::hash
as it’s the default. To find out how to use
a user-defined type, read the section on extending boost::hash
for user types.
If you wish to use boost::hash
with the standard unordered associative
containers, pass it as a template parameter:
std::unordered_multiset<int, boost::hash<int> >
set_of_ints;
std::unordered_set<std::pair<int, int>, boost::hash<std::pair<int, int> > >
set_of_pairs;
std::unordered_map<int, std::string, boost::hash<int> > map_int_to_string;
To use boost::hash
directly, create an instance and call it as a function:
#include <boost/container_hash/hash.hpp>
int main()
{
boost::hash<std::string> string_hash;
std::size_t h = string_hash("Hash me");
}
or alternatively:
#include <boost/container_hash/hash.hpp>
int main()
{
std::size_t h = boost::hash<std::string>()("Hash me");
}
For an example of generic use, here is a function to generate a vector containing the hashes of the elements of a container:
template <class Container>
std::vector<std::size_t> get_hashes(Container const& x)
{
std::vector<std::size_t> hashes;
std::transform(x.begin(), x.end(), std::back_inserter(hashes),
boost::hash<typename Container::value_type>());
return hashes;
}
Extending boost::hash for User Types
boost::hash
is implemented by calling the function hash_value
. The
namespace isn’t specified so that it can detect overloads via argument
dependant lookup. So if there is a free function hash_value
in the same
namespace as a user type, it will get called.
If you have a structure library::book
, where each book is uniquely defined
by its member id
:
namespace library
{
struct book
{
int id;
std::string author;
std::string title;
// ....
};
bool operator==(book const& a, book const& b)
{
return a.id == b.id;
}
}
Then all you would need to do is write the function library::hash_value
:
namespace library
{
std::size_t hash_value(book const& b)
{
boost::hash<int> hasher;
return hasher(b.id);
}
}
And you can now use boost::hash
with book:
library::book knife(3458, "Zane Grey", "The Hash Knife Outfit");
library::book dandelion(1354, "Paul J. Shanley",
"Hash & Dandelion Greens");
boost::hash<library::book> book_hasher;
std::size_t knife_hash_value = book_hasher(knife);
// If std::unordered_set is available:
std::unordered_set<library::book, boost::hash<library::book> > books;
books.insert(knife);
books.insert(library::book(2443, "Lindgren, Torgny", "Hash"));
books.insert(library::book(1953, "Snyder, Bernadette M.",
"Heavenly Hash: A Tasty Mix of a Mother's Meditations"));
assert(books.find(knife) != books.end());
assert(books.find(dandelion) == books.end());
The full example can be found in examples/books.hpp and examples/books.cpp.
Tip
|
When writing a hash function, first look at how the equality function
works. Objects that are equal must generate the same hash value. When objects
are not equal they should generate different hash values. In this object
equality was based just on id so the hash function only hashes id . If it
was based on the object’s name and author then the hash function should take
them into account (how to do this is discussed in the next section).
|
Combining Hash Values
Say you have a point class, representing a two dimensional location:
class point
{
int x;
int y;
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
bool operator==(point const& other) const
{
return x == other.x && y == other.y;
}
};
and you wish to use it as the key for an unordered_map
. You need to
customise the hash for this structure. To do this we need to combine the
hash values for x
and y
. The function boost::hash_combine
is supplied
for this purpose:
class point
{
...
friend std::size_t hash_value(point const& p)
{
std::size_t seed = 0;
boost::hash_combine(seed, p.x);
boost::hash_combine(seed, p.y);
return seed;
}
...
};
Calls to hash_combine
incrementally build the hash from the different
members of point
, it can be repeatedly called for any number of elements.
It calls hash_value
on the supplied element, and combines it with the seed.
Full code for this example is at examples/point.cpp.
Note that when using boost::hash_combine
the order of the calls matters.
std::size_t seed = 0;
boost::hash_combine(seed, 1);
boost::hash_combine(seed, 2);
and
std::size_t seed = 0;
boost::hash_combine(seed, 2);
boost::hash_combine(seed, 1);
result in a different values in seed
.
To calculate the hash of an iterator range you can use boost::hash_range
:
std::vector<std::string> some_strings;
std::size_t hash = boost::hash_range(some_strings.begin(), some_strings.end());
Since hash_range
works by repeatedly invoking hash_combine
on the elements
of the range, the hash value will also be dependent on the element order.
If you are calculating a hash value for a range where the order of the data
doesn’t matter, such as unordered_set
, you can use
boost::hash_unordered_range
instead.
std::unordered_set<std::string> set;
std::size_t hash = boost::hash_unordered_range(set.begin(), set.end());
When writing template classes, you might not want to include the main
hash.hpp
header as it’s quite an expensive include that brings in a lot of
other headers, so instead you can include the
<boost/container_hash/hash_fwd.hpp>
header which forward declares
boost::hash
, boost::hash_combine
, boost::hash_range
, and
boost::hash_unordered_range
. You’ll need to include the main header before
instantiating boost::hash
. When using a container that uses boost::hash
it
should do that for you, so your type will work fine with the Boost hash
containers. There’s an example of this in
examples/template.hpp and
examples/template.cpp.
To avoid including even hash_fwd.hpp
- which still requires the contents
of Boost.ContainerHash to be physically present - you are allowed to copy the
declarations from hash_fwd.hpp
(and only those) directly into your own
header. This is a special exception guaranteed by the library; in general,
you can’t declare library functions, Boost or otherwise, without risk of
breakage in a subsequent release.
Hashing User Types with Boost.Describe
Let’s look at our point
class again:
class point
{
int x;
int y;
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
};
If you’re using C++14 or above, a much easier way to add
support for boost::hash
to point
is by using
Boost.Describe (and
get an automatic definition of operator==
for free):
#include <boost/describe/class.hpp>
#include <boost/describe/operators.hpp>
class point
{
int x;
int y;
BOOST_DESCRIBE_CLASS(point, (), (), (), (x, y))
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
};
using boost::describe::operators::operator==;
using boost::describe::operators::operator!=;
(Full code for this example is at examples/point2.cpp.)
Since the point
class has been annotated with BOOST_DESCRIBE_CLASS
,
the library can enumerate its members (and base classes) and automatically
synthesize the appropriate hash_value
overload for it, without us needing
to do so.
Reference
<boost/container_hash/hash_fwd.hpp>
This header contains forward declarations for the library primitives. These declarations are guaranteed to be relatively stable, that is, best effort will be expended on their not changing from release to release, allowing their verbatim copy into user headers that do not wish to physically depend on Boost.ContainerHash.
namespace boost
{
namespace container_hash
{
template<class T> struct is_range;
template<class T> struct is_contiguous_range;
template<class T> struct is_unordered_range;
template<class T> struct is_described_class;
} // namespace container_hash
template<class T> struct hash;
template<class T> void hash_combine( std::size_t& seed, T const& v );
template<class It> void hash_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_range( It first, It last );
template<class It> void hash_unordered_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_unordered_range( It first, It last );
} // namespace boost
<boost/container_hash/hash.hpp>
Defines boost::hash
, and helper functions.
namespace boost
{
template<class T> struct hash;
template<class T> void hash_combine( std::size_t& seed, T const& v );
template<class It> void hash_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_range( It first, It last );
template<class It> void hash_unordered_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_unordered_range( It first, It last );
// Enabled only when T is an integral type
template<class T>
std::size_t hash_value( T v );
// Enabled only when T is an enumeration type
template<class T>
std::size_t hash_value( T v );
// Enabled only when T is a floating point type
template<class T>
std::size_t hash_value( T v );
template<class T>
std::size_t hash_value( T* const& v );
template<class T, std::size_t N>
std::size_t hash_value( T const (&v)[N] );
template<class T>
std::size_t hash_value( std::complex<T> const& v );
template<class A, class B>
std::size_t hash_value( std::pair<A, B> const& v );
template<class... T>
std::size_t hash_value( std::tuple<T...> const& v );
// Enabled only when container_hash::is_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_contiguous_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_unordered_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_described_class<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
template<class T>
std::size_t hash_value( std::shared_ptr<T> const& v );
template<class T, class D>
std::size_t hash_value( std::unique_ptr<T, D> const& v );
std::size_t hash_value( std::type_index const& v );
std::size_t hash_value( std::error_code const& v );
std::size_t hash_value( std::error_condition const& v );
template<class T>
std::size_t hash_value( std::optional<T> const& v );
std::size_t hash_value( std::monostate v );
template<class... T>
std::size_t hash_value( std::variant<T...> const& v );
} // namespace boost
hash<T>
template<class T> struct hash
{
std::size_t operator()( T const& v ) const;
};
operator()
std::size_t operator()( T const& v ) const;
- Returns:
-
hash_value(v)
. - Throws:
-
Only throws if
hash_value(v)
throws. - Remarks:
-
The call to
hash_value
is unqualified, so that user-supplied overloads will be found via argument dependent lookup.
hash_combine
template<class T> void hash_combine( std::size_t& seed, T const& v );
Called repeatedly to incrementally create a hash value from several variables.
- Effects:
-
Updates
seed
with a new hash value generated by deterministically combining it with the result ofboost::hash<T>()(v)
. - Throws:
-
Only throws if
boost::hash<T>()(v)
throws. On exception,seed
is not updated. - Remarks:
-
Equivalent to
seed = combine(seed, boost::hash<T>()(v))
, wherecombine(s, v)
is a mixing function that takes two arguments of typestd::size_t
and returnsstd::size_t
, with the following desirable properties:-
For a constant
s
, whenv
takes all possiblesize_t
values,combine(s, v)
should also take all possiblesize_t
values, producing a sequence that is close to random; that is, it should be a random permutation.This guarantees that for a given
seed
,combine
does not introduce hash collisions when none were produced byboost::hash<T>(v)
; that is, it does not lose information from the input. It also implies thatcombine(s, v)
, as a function ofv
, has good avalanche properties; that is, small (e.g. single bit) perturbations in the inputv
lead to large perturbations in the return value (half of the output bits changing, on average). -
For two different seeds
s1
ands2
,combine(s1, v)
andcombine(s2, v)
, treated as functions ofv
, should produce two different random permutations. -
combine(0, 0)
should not be 0. Since a common initial value ofseed
is zero,combine(0, 0) == 0
would imply that applyinghash_combine
on any sequence of zeroes, regardless of length, will produce zero. This is undesirable, as it would lead to e.g.std::vector<int>()
andstd::vector<int>(4)
to have the same hash value.
The current implementation uses the function
mix(s + 0x9e3779b9 + v)
ascombine(s, v)
, wheremix(x)
is a high quality mixing function that is a bijection over thestd::size_t
values, of the formx ^= x >> k1; x *= m1; x ^= x >> k2; x *= m2; x ^= x >> k3;
where the constants
k1
,k2
,k3
,m1
,m2
are suitably chosen.Note that
mix(0)
is 0. This is why we add the arbitrary constant0x9e3779b9
to meet the third requirement above. -
hash_range
template<class It> void hash_range( std::size_t& seed, It first, It last );
- Effects:
-
When
typename std::iterator_traits<It>::value_type
is notchar
,signed char
,unsigned char
,std::byte
, orchar8_t
,for( ; first != last; ++first ) { boost::hash_combine<typename std::iterator_traits<It>::value_type>( seed, *first ); }
Otherwise, bytes from
[first, last)
are coalesced in an unspecified manner and then passed tohash_combine
, more than one at a time. This is done in order to improve performance when hashing strings.
template<class It> std::size_t hash_range( It first, It last );
- Effects:
-
size_t seed = 0; boost::hash_range( seed, first, last ); return seed;
hash_unordered_range
template<class It> void hash_unordered_range( std::size_t& seed, It first, It last );
- Effects:
-
Updates
seed
with the values ofboost::hash<typename std::iterator_traits<It>::value_type>()(*i)
for eachi
in[first, last)
, such that the order of elements does not affect the final result.
template<class It> std::size_t hash_unordered_range( It first, It last );
- Effects:
-
size_t seed = 0; boost::hash_unordered_range( seed, first, last ); return seed;
hash_value
// Enabled only when T is an integral type
template<class T>
std::size_t hash_value( T v );
- Returns:
-
When the value of
v
fits intostd::size_t
, whenT
is an unsigned type, or intossize_t
, whenT
is a signed type,static_cast<std::size_t>(v)
.Otherwise, an unspecified value obtained by mixing the value bits of
v
.
// Enabled only when T is an enumeration type
template<class T>
std::size_t hash_value( T v );
- Returns:
-
static_cast<std::size_t>(v)
. - Remarks:
-
hash_value(std::to_underlying(v))
would be better, but C++03 compatibility mandates the current implementation.
// Enabled only when T is a floating point type
template<class T>
std::size_t hash_value( T v );
- Returns:
-
An unspecified value obtained by mixing the value bits of
v
. - Remarks:
-
When
sizeof(v) <= sizeof(std::size_t)
, the bits ofv
are returned as-is (except in the case of -0.0, which is treated as +0.0).
template<class T>
std::size_t hash_value( T* const& v );
- Returns:
-
An unspecified value derived from
reinterpret_cast<std::uintptr_t>(v)
.
template<class T, std::size_t N>
std::size_t hash_value( T const (&v)[N] );
- Returns:
-
boost::hash_range( v, v + N )
.
template<class T>
std::size_t hash_value( std::complex<T> const& v );
- Returns:
-
An unspecified value derived from
boost::hash<T>()(v.real())
andboost::hash<T>()(v.imag())
such that, ifv.imag() == 0
, the value is equal toboost::hash<T>()(v.real())
. - Remarks:
-
A more straightforward implementation would just have used
hash_combine
onv.real()
andv.imag()
, but the historical guarantee that real-valued complex numbers should match the hash value of their real part precludes it.This guarantee may be dropped in a future release, as it’s of questionable utility.
template<class A, class B>
std::size_t hash_value( std::pair<A, B> const& v );
- Effects:
-
std::size_t seed = 0; boost::hash_combine( seed, v.first ); boost::hash_combine( seed, v.second ); return seed;
template<class... T>
std::size_t hash_value( std::tuple<T...> const& v );
- Effects:
-
std::size_t seed = 0; boost::hash_combine( seed, std::get<0>(v) ); boost::hash_combine( seed, std::get<1>(v) ); // ... boost::hash_combine( seed, std::get<N-1>(v) ); return seed;
where
N
issizeof…(T)
.
// Enabled only when container_hash::is_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
- Returns:
-
boost::hash_range( v.begin(), v.end() )
. - Remarks:
-
This overload is only enabled when
container_hash::is_contiguous_range<T>::value
andcontainer_hash::is_unordered_range<T>::value
are bothfalse
.It handles all standard containers that aren’t contiguous or unordered, such as
std::deque
,std::list
,std::set
,std::map
.
// Enabled only when container_hash::is_contiguous_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
- Returns:
-
boost::hash_range( v.data(), v.data() + v.size() )
. - Remarks:
-
This overload handles all standard contiguous containers, such as
std::string
,std::vector
,std::array
,std::string_view
.
// Enabled only when container_hash::is_unordered_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
- Returns:
-
boost::hash_unordered_range( v.begin(), v.end() )
. - Remarks:
-
This overload handles the standard unordered containers, such as
std::unordered_set
andstd::unordered_map
.
// Enabled only when container_hash::is_described_class<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
- Effects:
-
std::size_t seed = 0; boost::hash_combine( seed, b1 ); boost::hash_combine( seed, b2 ); // ... boost::hash_combine( seed, bM ); boost::hash_combine( seed, m1 ); boost::hash_combine( seed, m2 ); // ... boost::hash_combine( seed, mN ); return seed;
where
bi
are the bases ofv
andmi
are its members.
template<class T>
std::size_t hash_value( std::shared_ptr<T> const& v );
template<class T, class D>
std::size_t hash_value( std::unique_ptr<T, D> const& v );
- Returns:
-
boost::hash<T*>( v.get() )
.
std::size_t hash_value( std::type_index const& v );
- Returns:
-
v.hash_code()
.
std::size_t hash_value( std::error_code const& v );
std::size_t hash_value( std::error_condition const& v );
- Effects:
-
std::size_t seed = 0; boost::hash_combine( seed, v.value() ); boost::hash_combine( seed, &v.category() ); return seed;
template<class T>
std::size_t hash_value( std::optional<T> const& v );
- Returns:
-
For a disengaged
v
, an unspecified constant value; otherwise,boost::hash<T>()( *v )
.
std::size_t hash_value( std::monostate v );
- Returns:
-
An unspecified constant value.
template<class... T>
std::size_t hash_value( std::variant<T...> const& v );
- Effects:
-
std::size_t seed = 0; boost::hash_combine( seed, v.index() ); boost::hash_combine( seed, x ); return seed;
where
x
is the currently contained value inv
. - Throws:
-
std::bad_variant_access
whenv.valueless_by_exception()
istrue
.
<boost/container_hash/is_range.hpp>
Defines the trait boost::container_hash::is_range
.
namespace boost
{
namespace container_hash
{
template<class T> struct is_range;
} // namespace container_hash
} // namespace boost
is_range<T>
template<class T> struct is_range
{
static constexpr bool value = /* see below */;
};
is_range<T>::value
is true
when, for a const value x
of type
T
, x.begin()
and x.end()
return iterators of the same type
It
(such that std::iterator_traits<It>
is a valid specialization.)
Users are allowed to specialize is_range
for their types if the
default behavior does not deduce the correct value.
<boost/container_hash/is_contiguous_range.hpp>
Defines the trait boost::container_hash::is_contiguous_range
.
namespace boost
{
namespace container_hash
{
template<class T> struct is_contiguous_range;
} // namespace container_hash
} // namespace boost
is_contiguous_range<T>
template<class T> struct is_contiguous_range
{
static constexpr bool value = /* see below */;
};
is_contiguous_range<T>::value
is true
when is_range<T>::value
is
true
and when, for a const value x
of type T
, x.data()
returns
a pointer to a type that matches the value_type
of the iterator returned
by x.begin()
and x.end()
, and x.size()
returns a value of an integral
type.
Users are allowed to specialize is_contiguous_range
for their types
if the default behavior does not deduce the correct value.
<boost/container_hash/is_unordered_range.hpp>
Defines the trait boost::container_hash::is_unordered_range
.
namespace boost
{
namespace container_hash
{
template<class T> struct is_unordered_range;
} // namespace container_hash
} // namespace boost
is_unordered_range<T>
template<class T> struct is_unordered_range
{
static constexpr bool value = /* see below */;
};
is_unordered_range<T>::value
is true
when is_range<T>::value
is
true
and when T::hasher
is a valid type.
Users are allowed to specialize is_unordered_range
for their types
if the default behavior does not deduce the correct value.
<boost/container_hash/is_described_class.hpp>
Defines the trait boost::container_hash::is_described_class
.
namespace boost
{
namespace container_hash
{
template<class T> struct is_described_class;
} // namespace container_hash
} // namespace boost
is_described_class<T>
template<class T> struct is_described_class
{
static constexpr bool value = /* see below */;
};
is_described_class<T>::value
is true
when
boost::describe::has_describe_bases<T>::value
is true
,
boost::describe::has_describe_members<T>::value
is true
, and
T
is not a union.
Users are allowed to specialize is_described_class
for their types
if the default behavior does not deduce the correct value.
Design and Implementation Notes
Quality of the Hash Function
Many hash functions strive to have little correlation between the input and output values. They attempt to uniformally distribute the output values for very similar inputs. This hash function makes no such attempt. In fact, for integers, the result of the hash function is often just the input value. So similar but different input values will often result in similar but different output values. This means that it is not appropriate as a general hash function. For example, a hash table may discard bits from the hash function resulting in likely collisions, or might have poor collision resolution when hash values are clustered together. In such cases this hash function will perform poorly.
But the standard has no such requirement for the hash function, it just requires that the hashes of two different values are unlikely to collide. Containers or algorithms designed to work with the standard hash function will have to be implemented to work well when the hash function’s output is correlated to its input. Since they are paying that cost a higher quality hash function would be wasteful.
The hash_value Customization Point
The way one customizes the standard std::hash
function object for user
types is via a specialization. boost::hash
chooses a different mechanism — an overload of a free function hash_value
in the user namespace that is
found via argument-dependent lookup.
Both approaches have their pros and cons. Specializing the function object
is stricter in that it only applies to the exact type, and not to derived
or convertible types. Defining a function, on the other hand, is easier
and more convenient, as it can be done directly in the type definition as
an inline
friend
.
The fact that overloads can be invoked via conversions did cause issues in
an earlier iteration of the library that defined hash_value
for all
integral types separately, including bool
. Especially under C++03,
which doesn’t have explicit
conversion operators, some types were
convertible to bool
to allow their being tested in e.g. if
statements,
which caused them to hash to 0 or 1, rarely what one expects or wants.
This, however, was fixed by declaring the built-in hash_value
overloads
to be templates constrained on e.g. std::is_integral
or its moral
equivalent. This causes types convertible to an integral to no longer
match, avoiding the problem.
hash_combine
The initial implementation of the library was based on Issue 6.18 of the
Library Extension Technical Report Issues List
(pages 63-67) which proposed the following implementation of hash_combine
:
template<class T>
void hash_combine(size_t & seed, T const & v)
{
seed ^= hash_value(v) + (seed << 6) + (seed >> 2);
}
taken from the paper "Methods for Identifying Versioned and Plagiarised Documents" by Timothy C. Hoad and Justin Zobel.
During the Boost formal review, Dave Harris pointed out that this suffers
from the so-called "zero trap"; if seed
is initially 0, and all the
inputs are 0 (or hash to 0), seed
remains 0 no matter how many input
values are combined.
This is an undesirable property, because it causes containers of zeroes to have a zero hash value regardless of their sizes.
To fix this, the arbitrary constant 0x9e3779b9
(the golden ratio in a
32 bit fixed point representation) was added to the computation, yielding
template<class T>
void hash_combine(size_t & seed, T const & v)
{
seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
This is what shipped in Boost 1.33, the first release containing the library.
This function was a reasonable compromise between quality and speed for its
time, when the input consisted of char
s, but it’s less suitable for
combining arbitrary size_t
inputs.
In Boost 1.56, it was replaced by functions derived from Austin Appleby’s MurmurHash2 hash function round.
In Boost 1.81, it was changed again — to the equivalent of
mix(seed + 0x9e3779b9 + hash_value(v))
, where mix(x)
is a high quality
mixing function that is a bijection over the size_t
values, of the form
x ^= x >> k1;
x *= m1;
x ^= x >> k2;
x *= m2;
x ^= x >> k3;
This type of mixing function was originally devised by Austin Appleby as the "final mix" part of his MurmurHash3 hash function. He used
x ^= x >> 33;
x *= 0xff51afd7ed558ccd;
x ^= x >> 33;
x *= 0xc4ceb9fe1a85ec53;
x ^= x >> 33;
as the 64 bit function fmix64
and
x ^= x >> 16;
x *= 0x85ebca6b;
x ^= x >> 13;
x *= 0xc2b2ae35;
x ^= x >> 16;
as the 32 bit function fmix32
.
Several improvements of the 64 bit function have been subsequently proposed, by David Stafford, Pelle Evensen, and Jon Maiga. We currently use Jon Maiga’s function
x ^= x >> 32;
x *= 0xe9846af9b1a615d;
x ^= x >> 32;
x *= 0xe9846af9b1a615d;
x ^= x >> 28;
Under 32 bit, we use a mixing function proposed by "TheIronBorn" in a Github issue in the repository of Hash Prospector by Chris Wellons:
x ^= x >> 16;
x *= 0x21f0aaad;
x ^= x >> 15;
x *= 0x735a2d97;
x ^= x >> 15;
With this improved hash_combine
, boost::hash
for strings now passes the
SMHasher test suite by Austin Appleby
(for a 64 bit size_t
).
hash_range
The traditional implementation of hash_range(seed, first, last)
has been
for( ; first != last; ++first )
{
boost::hash_combine<typename std::iterator_traits<It>::value_type>( seed, *first );
}
(the explicit template parameter is needed to support iterators with proxy
return types such as std::vector<bool>::iterator
.)
This is logical, consistent and straightforward. In the common case where
typename std::iterator_traits<It>::value_type
is char
— which it is
in the common case of boost::hash<std::string>
— this however leaves a
lot of performance on the table, because processing each char
individually
is much less efficient than processing several in bulk.
In Boost 1.81, hash_range
was changed to process elements of type char
,
signed char
, unsigned char
, std::byte
, or char8_t
, four of a time.
A uint32_t
is composed from first[0]
to first[3]
, and that uint32_t
is fed to hash_combine
.
In principle, when size_t
is 64 bit, we could have used uint64_t
instead.
We do not, because this allows producing an arbitrary hash value by choosing
the input bytes appropriately (because hash_combine
is reversible.)
Allowing control only over 32 bits of the full 64 bit size_t
value makes
these "chosen plaintext attacks" harder.
This is not as harmful to performance as it first appears, because the
input to hash<string>
(e.g. the key in an unordered container) is often
short (9 to 13 bytes in some typical scenarios.)
Note that hash_range
has also traditionally guaranteed that the same element
sequence yields the same hash value regardless of the iterator type. This
property remains valid after the changes to char
range hashing. hash_range
,
applied to the char
sequence { 'a', 'b', 'c' }
, results in the same value
whether the sequence comes from char[3]
, std::string
, std::deque<char>
,
or std::list<char>
.
Links
A Proposal to Add Hash Tables to the Standard Library
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2003/n1456.html
The hash table proposal explains much of the design. The hash function object is discussed in Section D.
The C++ Standard Library Technical Report
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
Contains the hash function specification in section 6.3.2.
Library Extension Technical Report Issues List
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf
The library implements the extension described in Issue 6.18, pages 63-67.
Methods for Identifying Versioned and Plagiarised Documents
Timothy C. Hoad, Justin Zobel
https://people.eng.unimelb.edu.au/jzobel/fulltext/jasist03thz.pdf
Contains the hash function that the initial implementation of boost::hash_combine
was based on.
Performance in Practice of String Hashing Functions
M.V. Ramakrishna, J. Zobel
In Proc. Int. Conf. on Database Systems for Advanced Applications, pages 215-223, Melbourne, Australia, April 1997.
https://www.comp.nus.edu.sg/~lingtw/dasfaa_proceedings/DASFAA97/P215.pdf
Referenced in the above paper as the source of the hash function.
MurmurHash3 hash function source
Austin Appleby
https://github.com/aappleby/smhasher/blob/61a0530f28277f2e850bfc39600ce61d02b518de/src/MurmurHash3.cpp#L65-L90
Austin Appleby’s 32 and 64 bit finalization mixing functions that introduced the "xmxmx" general form of a high quality bijective transformation that approximates a random permutation.
SMHasher hash function test suite
Austin Appleby
https://github.com/aappleby/smhasher
Contains a battery of tests for evaluating hash functions. The current
64 bit implementation of boost::hash
for strings passes SMHasher.
Previous iterations did not.
Better Bit Mixing - Improving on MurmurHash3’s 64-bit Finalizer
David Stafford
https://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html
Describes the so-called "variant 13" mixing function, an improvement
over fmix64
from MurmurHash3, made famous by its adoption by the
splitmix64
random number generator.
Stronger, better, morer, Moremur; a better Murmur3-type mixer
Pelle Evensen
https://mostlymangling.blogspot.com/2019/12/stronger-better-morer-moremur-better.html
Describes Moremur, an improvement over MurmurHash3 fmix64
and Stafford
"variant 13".
Improved mx3 and the RRC test
Jon Maiga
http://jonkagstrom.com/mx3/mx3_rev2.html
Contains another improvement over MurmurHash3 fmix64
and "variant 13". This
is what the current implementation of boost::hash_combine
uses when
std::size_t
is 64 bits.
Prospecting for Hash Functions
Chris Wellons
https://nullprogram.com/blog/2018/07/31/
Describes Hash Prospector, a utility for discovering and evaluating mixing functions.
New best known functions
"TheIronBorn"
https://github.com/skeeto/hash-prospector/issues/19
Describes a good 32 bit mixing function, used by the current implementation
of boost::hash_combine
when std::size_t
is 32 bits.
Acknowledgements
This library is based on the design by Peter Dimov. During the initial development Joaquín M López Muñoz made many useful suggestions and contributed fixes.
The formal review was managed by Thorsten Ottosen, and the library reviewed by: David Abrahams, Alberto Barbati, Topher Cooper, Caleb Epstein, Dave Harris, Chris Jefferson, Bronek Kozicki, John Maddock, Tobias Swinger, Jaap Suter, Rob Stewart and Pavel Vozenilek. Since then, further constructive criticism has been made by Daniel Krügler, Alexander Nasonov and 沈慧峰.
The implementation of the hash function for pointers is based on suggestions made by Alberto Barbati and Dave Harris. Dave Harris also suggested an important improvement to boost::hash_combine
that was taken up.
Some useful improvements to the floating point hash algorithm were suggested by Daniel Krügler.
The original implementation came from Jeremy B. Maitin-Shepard’s hash table library, although this is a complete rewrite.
The documentation was converted from Quickbook to AsciiDoc by Christian Mazakas.
Change Log
Boost 1.67.0
-
Moved library into its own module,
container_hash
. -
Moved headers for new module name, now at:
<boost/container_hash/hash.hpp>
,<boost/container_hash/hash_fwd.hpp>
,<boost/container_hash/extensions.hpp>
. -
Added forwarding headers to support the old headers locations.
-
Support
std::string_view
,std::error_code
,std::error_condition
,std::optional
,std::variant
,std::monostate
where available. -
Update include paths from other Boost libraries.
-
Manually write out tuple overloads, rather than using the preprocessor to generate them. Should improve usability, due to better error messages, and easier debugging.
-
Fix tutorial example (#11017).
-
Quick fix for hashing
vector<bool>
when using libc++. Will try to introduce a more general fix in the next release.
Boost 1.66.0
-
Avoid float comparison warning when using Clang - this workaround was already in place for GCC, and was used when Clang pretends to be GCC, but the warning was appearing when running Clang in other contexts.
Boost 1.65.0
-
Support for
char16_t
,char32_t
,u16string
,u32string
Boost 1.64.0
-
Fix for recent versions of Visual C++ which have removed
std::unary_function
andstd::binary_function
(#12353).
Boost 1.63.0
-
Fixed some warnings.
-
Only define hash for
std::wstring
when we know we have awchar_t
. Otherwise there’s a compile error as there’s no overload for hashing the characters in wide strings (#8552).
Boost 1.58.0
-
Fixed strict aliasing violation (GitHub #3).
Boost 1.56.0
-
Removed some Visual C++ 6 workarounds.
-
Ongoing work on improving
hash_combine
. This changes the combine function which was previously defined in the reference documentation.
Boost 1.55.0
Boost 1.54.0
-
Ticket 7957: Fixed a typo.
Boost 1.53.0
-
Add support for
boost::int128_type
andboost::uint128_type
where available - currently only__int128
andunsigned __int128
on some versions of gcc. -
On platforms that are known to have the standard floating point functions, don’t use automatic detection - which can break if there are ambiguous overloads.
-
Fix undefined behaviour when using the binary
float
hash (Thomas Heller).
Boost 1.52.0
-
Restore
enum
support, which was accidentally removed in the last version. -
New floating point hasher - will hash the binary representation on more platforms, which should be faster.
Boost 1.51.0
-
Support the standard smart pointers.
-
hash_value
now implemented using SFINAE to avoid implicit casts to built in types when calling it. -
Updated to use the new config macros.
Boost 1.50.0
-
Ticket 6771: Avoid gcc’s
-Wfloat-equal
warning. -
Ticket 6806: Support
std::array
andstd::tuple
when available. -
Add deprecation warning to the long deprecated
boost/container_hash/detail/container_fwd.hpp
.
Boost 1.46.0
-
Avoid warning due with gcc’s
-Wconversion
flag.
Boost 1.44.0
-
Add option to prevent implicit conversions when calling
hash_value
by definingBOOST_HASH_NO_IMPLICIT_CASTS
. When usingboost::hash
for a type that does not havehash_value
declared but does have an implicit conversion to a type that does, it would use that implicit conversion to hash it. Which can sometimes go very wrong, e.g. using a conversion tobool
and only hashing to 2 possible values. Since fixing this is a breaking change and was only approached quite late in the release cycle with little discussion it’s opt-in for now. This, or something like it, will become the default in a future version.
Boost 1.43.0
-
Ticket 3866: Don’t foward declare containers when using gcc’s parallel library, allow user to stop forward declaration by defining the
BOOST_DETAIL_NO_CONTAINER_FWD
macro. -
Ticket 4038: Avoid hashing
0.5
and0
to the same number. -
Stop using deprecated
BOOST_HAS_*
macros.
Boost 1.42.0
-
Reduce the number of warnings for Visual C++ warning level 4.
-
Some code formatting changes to fit lines into 80 characters.
-
Rename an internal namespace.
Boost 1.40.0
-
Automatically configure the
float
functions using template metaprogramming instead of trying to configure every possibility manually. -
Workaround for when STLport doesn’t support long double.
Boost 1.39.0
-
Move the
hash_fwd.hpp
implementation into the hash subdirectory, leaving a forwarding header in the old location. You should still use the old location, the new location is mainly for implementation and possible modularization. -
Ticket 2412: Removed deprecated headers.
-
Ticket 2957: Fix configuration for vxworks.
Boost 1.38.0
-
Changed the warnings in the deprecated headers from 1.34.0 to errors. These will be removed in a future version of Boost.
-
Moved detail headers out of
boost/container_hash/detail
, since they are part offunctional/hash
, notcontainer_hash
.boost/container_hash/detail/container_fwd.hpp
has been moved toboost/detail/container_fwd.hpp
as it’s used outside of this library, the others have been moved toboost/functional/hash/detail
.
Boost 1.37.0
-
Ticket 2264: In Visual C++, always use C99 float functions for long double and float as the C++ overloads aren’t always availables.
Boost 1.36.0
-
Stop using OpenBSD’s dodgy
std::numeric_limits
. -
Using the boost typedefs for
long long
andunsigned long long
. -
Move the extensions into their own header.
Boost 1.35.0
-
Support for
long long
,std::complex
. -
Improved algorithm for hashing floating point numbers:
-
Improved portablity, as described by Daniel Krügler in a post to the boost users list.
-
Fits more information into each combine loop, which can reduce the the number of times combine is called and hopefully give a better quality hash function.
-
Improved the algorithm for hashing floating point numbers.
-
On Cygwin use a binary hash function for floating point numbers, as Cygwin doesn’t have decent floating point functions for
long double
. -
Never uses
fpclass
which doesn’t supportlong double
. -
Ticket 1064: Removed unnecessary use of errno.
-
-
Explicitly overload for more built in types.
-
Minor improvements to the documentation.
-
A few bug and warning fixes:
-
Ticket 1509: Suppress another Visual C++ warning.
-
Some workarounds for the Sun compilers.
-
Boost 1.34.1
-
Ticket 952: Suppress incorrect 64-bit warning on Visual C++.
Boost 1.34.0
-
Use declarations for standard classes, so that the library doesn’t need to include all of their headers
-
Deprecated the
<boost/functional/hash/*.hpp>
headers. Now a single header,<boost/functional/hash.hpp>
is used. -
Add support for the
BOOST_HASH_NO_EXTENSIONS
macro, which disables the extensions to TR1. -
Minor improvements to the hash functions for floating point numbers.
-
Update the portable example to hopefully be more generally portable.
Boost 1.33.1
-
Fixed the points example, as pointed out by 沈慧峰.
Boost 1.33.0
-
Initial Release
Copyright and License
This documentation is
-
Copyright 2005-2008 Daniel James
-
Copyright 2022 Peter Dimov
and is distributed under the Boost Software License, Version 1.0.