What Is the Default Hash Function Used in C++ Std::Unordered_Map

What is the default hash function used in C++ std::unordered_map?

The function object std::hash<> is used.

Standard specializations exist for all built-in types, and some other standard library types
such as std::string and std::thread. See the link for the full list.

For other types to be used in a std::unordered_map, you will have to specialize std::hash<> or create your own function object.

The chance of collision is completely implementation-dependent, but considering the fact that integers are limited between a defined range, while strings are theoretically infinitely long, I'd say there is a much better chance for collision with strings.

As for the implementation in GCC, the specialization for builtin-types just returns the bit pattern. Here's how they are defined in bits/functional_hash.h:

  /// Partial specializations for pointer types.
template<typename _Tp>
struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
{
size_t
operator()(_Tp* __p) const noexcept
{ return reinterpret_cast<size_t>(__p); }
};

// Explicit specializations for integer types.
#define _Cxx_hashtable_define_trivial_hash(_Tp) \
template<> \
struct hash<_Tp> : public __hash_base<size_t, _Tp> \
{ \
size_t \
operator()(_Tp __val) const noexcept \
{ return static_cast<size_t>(__val); } \
};

/// Explicit specialization for bool.
_Cxx_hashtable_define_trivial_hash(bool)

/// Explicit specialization for char.
_Cxx_hashtable_define_trivial_hash(char)

/// ...

The specialization for std::string is defined as:

#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
/// std::hash specialization for string.
template<>
struct hash<string>
: public __hash_base<size_t, string>
{
size_t
operator()(const string& __s) const noexcept
{ return std::_Hash_impl::hash(__s.data(), __s.length()); }
};

Some further search leads us to:

struct _Hash_impl
{
static size_t
hash(const void* __ptr, size_t __clength,
size_t __seed = static_cast<size_t>(0xc70f6907UL))
{ return _Hash_bytes(__ptr, __clength, __seed); }
...
};
...
// Hash function implementation for the nontrivial specialization.
// All of them are based on a primitive that hashes a pointer to a
// byte array. The actual hash algorithm is not guaranteed to stay
// the same from release to release -- it may be updated or tuned to
// improve hash quality or speed.
size_t
_Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

_Hash_bytes is an external function from libstdc++. A bit more searching led me to this file, which states:

// This file defines Hash_bytes, a primitive used for defining hash
// functions. Based on public domain MurmurHashUnaligned2, by Austin
// Appleby. http://murmurhash.googlepages.com/

So the default hashing algorithm GCC uses for strings is MurmurHashUnaligned2.

How does std::unordered_map actually use hash functions?

For most standard library containers, the answer would be: However it feels like, it's an implementation detail left up to the writer of the library.

However, unordered_map is a little peculiar in that respect because it not only has to behave in a certain way, but it also has contraints applied to how it's implemented.

From the standard: http://eel.is/c++draft/unord.req#general-9

The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets elements appear in, but does not invalidate pointers or references to elements. For unordered_­multiset and unordered_­multimap, rehashing preserves the relative ordering of equivalent elements.

In short, the map has N buckets at any given time. The result of the hash function is used to pick a bucket by doing something along the lines of bucket_id = hash_value % N. If the buckets start to get too "full", the map will increase N, and reorganize its contents.

How are things organized within a bucket is not really specified. It's typically a linked list.

How is the hashcode of std::unordered_map calculated?

std::map is not a hashmap.

It's implemented using self-balancing trees; therefore, std::map has O(log n) lookup times.

std::unordered_map is a hashmap.

std::unordered_map has (amortized) O(1) lookup times, and it's parameterized by std::hash, which is implementation dependent.

Therefore, there's no one hashing function that is used across implementations.

You're free to provide your own hashing functor granted it implements std::hash, but beware: there lie dragons. A lot of research went into making different C++ implementations fast.

How can I change the default seed in std::hash?

Note that, according to Wikipedia, MurmurHash (https://en.wikipedia.org/wiki/MurmurHash) is a non-cryptographic hash function, thus not suited if strong cryptography is needed.

However, in a std::unordered_map, the hash is not used for security reasons but to organize the key/value pairs into buckets of memory. Moreover, for example in gcc, basic types are not hashed at all, but reinterpret_cast<size_t> to size_t. From http://en.cppreference.com/w/cpp/utility/hash :

Notably, some implementations use trivial (identity) hash functions
which map an integer to itself. In other words, these hash functions
are designed to work with unordered associative containers, but not as
cryptographic hashes, for example.

If you nevertheless want to change the seed of the hash, you need to implement a functor object and provide your own hash algorithm. The code below should give you an idea how to hook in your own hash implementation or how to directly use MurmurHash2 and provide a seed.

The line indicated with //HACK will use the hash function from the gcc library implementation (std::_Hash_impl::hash) and will depend on a particular compiler/library implementation. As others pointed out, direct use of this function is discouraged.

If other types than std::string need to be hashed, different template specializations need to be implemented.

#include <string>
#include <unordered_map>
#include "MurmurHash2.h"

template <class T> struct MyHash;
template<> struct MyHash<std::string>
{
std::size_t operator()(std::string const& s) const noexcept
{
size_t seed = static_cast<size_t>(0xdeadbeef);
//return std::_Hash_impl::hash(s.data(), s.length(), seed); //HACK
return MurmurHash2 ( s.data(), s.length(), seed );
}
};

int main()
{
std::unordered_map<std::string,std::string,MyHash<std::string> >
u_map { {"s1","A"} , {"s2","B"} };
return 0;
};

Get MurmurHash from github.

Is there a default hash function for an unordered_set of a custom class?

If you don't specify your own hash functor as template argument, it will default to std::hash<MyClass>, which does not exist unless you define it.

Best define your own specialization of std::hash inside namespace std:

namespace std {
template <>
struct hash<MyClass>
{
typedef MyClass argument_type;
typedef std::size_t result_type;

result_type operator()(const MyClass & t) const
{
/* ..calculate hash value for t */
}
};
}

And make sure you include this code before the declaration of your hash. This way you can declare the hash simply as std::unordered_set<MyClass> with no need for further template arguments.

You didn't specify what MyClass looks like inside, but a typical situation is that your user-defined type simply consists of several simple-type members, for which a default hash function exists. In this case, you will probably want to combine the hash values for the individual types to a hash value for the entire combination. The Boost library provides a function called hash_combine for this purpose. Of course, there is no guarantee that it will work well in your particular case (it depends on the distribution of data values and the likelihood of collisions), but it provides a good and easy-to-use starting point.

Here is an example of how to use it, assuming MyClass consists of two string members:

#include <unordered_set>
#include <boost/functional/hash.hpp>

struct MyClass
{
std::string _s1;
std::string _s2;
};

namespace std {
template <>
struct hash<MyClass>
{
typedef MyClass argument_type;
typedef std::size_t result_type;

result_type operator()(const MyClass & t) const
{
std::size_t val { 0 };
boost::hash_combine(val,t._s1);
boost::hash_combine(val,t._s2);
return val;
}
};
}

int main()
{
std::unordered_set<MyClass> s;
/* ... */
return 0;
}

C++ Hash function for string in unordered_map

C++ STL provides template specializations of std::hash for the various string classes. You could just specify std::string as key type for std::unordered_map:

#include <string>
#include <unordered_map>

int main()
{
std::unordered_map<std::string, int> map;
map["string"] = 10;
return 0;
}

Does std::unordered_mapint,float still have to hash the integer to get to the value?

I would like to know whether std::unordered_map< int, float > still has to hash the given integer

Yes it does.

I need to perform this operation very fast many times

Did you complete your project and witnessed that this is the bottleneck of it? If not, then watch out, since you might end up as a victim of premature optimization!

how would I go about redefining it?

You have to write your own code then. Example: C++ unordered_map using a custom class type as the key, where you would use struct Key { int value; };.



Related Topics



Leave a reply



Submit