Unordered_Map Hash Function C++

unordered_map hash function c++

This is an unfortunate omission in C++11; Boost has the answer in terms of hash_combine. Feel free to just paste it from them! Here's how I hash pairs:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

namespace std
{
  template<typename S, typename T> struct hash<pair<S, T>>
  {
    inline size_t operator()(const pair<S, T> & v) const
    {
      size_t seed = 0;
      ::hash_combine(seed, v.first);
      ::hash_combine(seed, v.second);
      return seed;
    }
  };
}

You can use hash_combine as the basis for many other things, like tuples and ranges, so you could hash an entire (ordered) container, for example, as long as each member is individually hashable.

Now you can just declare a new map:

std::unordered_map<std::pair<int, int>, my_mapped_type> mymap;

If you want to use your homebrew hasher (which hasn't got good statistical properties), you have to specify the template parameters explicitly:

std::unordered_map<std::pair<int,int>, int, pairHash> yourmap;

Note that there's no need to specify a copy of a hasher object, as the default is to default-construct one for you.

How does std::unordered_map actually use hash functions?

For most standard library containers, the answer would be: However it feels like, it's an implementation detail left up to the writer of the library.

However, unordered_map is a little peculiar in that respect because it not only has to behave in a certain way, but it also has contraints applied to how it's implemented.

From the standard: http://eel.is/c++draft/unord.req#general-9

The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets elements appear in, but does not invalidate pointers or references to elements. For unordered_multiset and unordered_multimap, rehashing preserves the relative ordering of equivalent elements.

In short, the map has N buckets at any given time. The result of the hash function is used to pick a bucket by doing something along the lines of bucket_id = hash_value % N. If the buckets start to get too "full", the map will increase N, and reorganize its contents.

How are things organized within a bucket is not really specified. It's typically a linked list.

Unordered_map of unordered_map vs custom hash function for pair key C++?

I would use std::unordered_map<std::pair<std::string, std::string>, Value, [pair_hash][1]> for two reasons:

Performance

Of course, you can measure your two versions with your favorite profiler, but basing on my experience - the number of allocation is what matters here the most - so see:

flat_map.insert(key, value)); will create on average just one new bucket (or extend one), whilst

auto it = map2.insert(make_pair(key.first, map1{}));
it->second.insert(make_pair(key.second, value));

have to create empty map1 - what might not be zero-cost. Then it has to add/extend two buckets (list associated with the given hash value).

Maintainability/Readability

The Second reason is more important for me. Flat(one) map is easy to use. You could see in insert example already that it is more complicated, but consider erase - it is so complicated, that it is easy to make a mistake:


void remove(
   std::unordered_map<std::string, 
           std::unordered_map<std::string, Value>>& map,
    std::pair<std::string, std::string> const& key)
{
   auto it1 = map.find(key.first);
   if (it1 == map.end()) return;
   it1->second.erase(key.second);
   // easy to forget part
   if (it1->second.empty())
   {
       map.erase(it1);
   }
}

unordered_map with custom hash function and comparison predicate gives compilation error

bool operator()(struct keys const& k1, struct keys const& k2)

must be const

//                                                            VVVVV
bool operator()(struct keys const& k1, struct keys const& k2) const

You can also see it in the error message that the function must be callable on a const reference of the object

//                           VVVVV       V
static_assert(__is_invocable<const _Equal&, const _Key&, const _Key&>{},

I couldn't find anything in the standard that explicitly says, it must be const, at most in the named requirement for comparators:

[...] evaluation of that expression is not allowed to call non-const functions through the dereferenced iterators.

C++ Hash function for string in unordered_map

C++ STL provides template specializations of std::hash for the various string classes. You could just specify std::string as key type for std::unordered_map:

#include <string>
#include <unordered_map>

int main()
{
    std::unordered_map<std::string, int> map;
    map["string"] = 10;
    return 0;
}

What is the default hash function used in C++ std::unordered_map?

The function object std::hash<> is used.

Standard specializations exist for all built-in types, and some other standard library types
such as std::string and std::thread. See the link for the full list.

For other types to be used in a std::unordered_map, you will have to specialize std::hash<> or create your own function object.

The chance of collision is completely implementation-dependent, but considering the fact that integers are limited between a defined range, while strings are theoretically infinitely long, I'd say there is a much better chance for collision with strings.

As for the implementation in GCC, the specialization for builtin-types just returns the bit pattern. Here's how they are defined in bits/functional_hash.h:

  /// Partial specializations for pointer types.
  template<typename _Tp>
    struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
    {
      size_t
      operator()(_Tp* __p) const noexcept
      { return reinterpret_cast<size_t>(__p); }
    };

  // Explicit specializations for integer types.
#define _Cxx_hashtable_define_trivial_hash(_Tp)     \
  template<>                        \
    struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
    {                                                   \
      size_t                                            \
      operator()(_Tp __val) const noexcept              \
      { return static_cast<size_t>(__val); }            \
    };

  /// Explicit specialization for bool.
  _Cxx_hashtable_define_trivial_hash(bool)

  /// Explicit specialization for char.
  _Cxx_hashtable_define_trivial_hash(char)

  /// ...

The specialization for std::string is defined as:

#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
  /// std::hash specialization for string.
  template<>
    struct hash<string>
    : public __hash_base<size_t, string>
    {
      size_t
      operator()(const string& __s) const noexcept
      { return std::_Hash_impl::hash(__s.data(), __s.length()); }
    };

Some further search leads us to:

struct _Hash_impl
{
  static size_t
  hash(const void* __ptr, size_t __clength,
       size_t __seed = static_cast<size_t>(0xc70f6907UL))
  { return _Hash_bytes(__ptr, __clength, __seed); }
  ...
};
...
// Hash function implementation for the nontrivial specialization.
// All of them are based on a primitive that hashes a pointer to a
// byte array. The actual hash algorithm is not guaranteed to stay
// the same from release to release -- it may be updated or tuned to
// improve hash quality or speed.
size_t
_Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

_Hash_bytes is an external function from libstdc++. A bit more searching led me to this file, which states:

// This file defines Hash_bytes, a primitive used for defining hash
// functions. Based on public domain MurmurHashUnaligned2, by Austin
// Appleby.  http://murmurhash.googlepages.com/

So the default hashing algorithm GCC uses for strings is MurmurHashUnaligned2.

How to hash an unordered_map?

The problem here is that there is no guarantee that the items even have an ordering among them.

So, sorting the items may very well not work for arbitrary unordered containers. You have 2 options:

Just XOR the hashes of all the individual elements. This is the fastest.
First sort the hashes of the containers, and then hash those. This may result in a better hash.

Unordered_Map Hash Function C++