Super High Performance C/C++ Hash Map (Table, Dictionary)

.NET HashTable Vs Dictionary - Can the Dictionary be as fast?

System.Collections.Generic.Dictionary<TKey, TValue> and System.Collections.Hashtable classes both maintain a hash table data structure internally. None of them guarantee preserving the order of items.

Leaving boxing/unboxing issues aside, most of the time, they should have very similar performance.

The primary structural difference between them is that Dictionary relies on chaining (maintaining a list of items for each hash table bucket) to resolve collisions whereas Hashtable uses rehashing for collision resolution (when a collision occurs, tries another hash function to map the key to a bucket).

There is little benefit to use Hashtable class if you are targeting for .NET Framework 2.0+. It's effectively rendered obsolete by Dictionary<TKey, TValue>.

Simple hashmap implementation in C++

Most compilers should define std::hash_map for you; in the coming C++0x standard, it will be part of the standard library as std::unordered_map. The STL Page on it is fairly standard. If you use Visual Studio, Microsoft has a page on it.

If you want to use your class as the value, not as the key, then you don't need to do anything special. All primitive types (things like int, char, bool and even char *) should "just work" as keys in a hash_map. However, for anything else you will have to define your own hashing and equality functions and then write "functors" that wrap them in a class.

Assuming your class is called MyClass and you have already defined:

size_t MyClass::HashValue() const { /* something */ }
bool MyClass::Equals(const MyClass& other) const { /* something */ }

You will need to define two functors to wrap those methods in objects.

struct MyClassHash {
size_t operator()(const MyClass& p) const {
return p.HashValue();
}
};

struct MyClassEqual {
bool operator()(const MyClass& c1, const MyClass& c2) const {
return c1.Equals(c2);
}
};

And instantiate your hash_map/hash_set as:

hash_map<MyClass, DataType, MyClassHash, MyClassEqual> my_hash_map;
hash_set<MyClass, MyClassHash, MyClassEqual> my_hash_set;

Everything should work as expected after that.

C++ std::unordered_map fastest way to insert new element only if it doesn't exist

This is what try_emplace is for. The API is set up in such a way as to avoid constructing nodes, or even the value, unless it's strictly necessary. It comes form N4279.

In your case, that would be:

auto [it, success] = map.try_emplace(key, largeObject);

Each of the four options in the OP has issues:

  • map[key] = largeObject doesn't actually do what you're asking for, it would overwrite the existing item. And even if it wasn't there, it requires default construction and copy assignment.

  • The approaches with count and find both require two lookups.

  • map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject)); is a single lookup but requires constructing the large object, and the pair, unconditionally.

Not mentioned in the OP is another option: map.emplace(key, largeObject); This has the issue that it's actually under-specified whether or not the pair is created in the case that the key exists. It does on some implementations. The motivation for try_emplace was to properly specify the API so that the pair definitely does not get created if the key already exists.

What are the differences between a HashMap and a Hashtable in Java?

There are several differences between HashMap and Hashtable in Java:

  1. Hashtable is synchronized, whereas HashMap is not. This makes HashMap better for non-threaded applications, as unsynchronized Objects typically perform better than synchronized ones.

  2. Hashtable does not allow null keys or values. HashMap allows one null key and any number of null values.

  3. One of HashMap's subclasses is LinkedHashMap, so in the event that you'd want predictable iteration order (which is insertion order by default), you could easily swap out the HashMap for a LinkedHashMap. This wouldn't be as easy if you were using Hashtable.

Since synchronization is not an issue for you, I'd recommend HashMap. If synchronization becomes an issue, you may also look at ConcurrentHashMap.

Is it always necessary to make hash table number of buckets a prime number for performance reason?

The answer is "usually you don't need a table whose size is a prime number, but there are some implementation reasons why you might want to do this."

Fundamentally, hash tables work best when hash codes are spread out as close to uniformly at random as possible. That prevents items from clustering in any one location within the table. At some level, provided that you have a good enough hash function to make this happen, the size of the table doesn't matter.

So why do folks say to pick tables whose size is a prime? There are two main reasons for this, and they're due to specific cases that don't arise in all hash tables.

One reason why you sometimes see prime-sized tables is due to a specific way of building hash functions. You can build reasonable hash functions by picking functions of the form h(x) = (ax + b) mod p, where a is a number in {1, 2, ..., p-1} and b is a number in the {0, 1, 2, ..., p-1}, assuming that p is a prime. If p isn't prime, hash functions of this form don't spread items out uniformly. As a result, if you're using a hash function like this one, then it makes sense to pick a table whose size is a prime number.

The second reason you see advice about prime-sized tables is if you're using an open-addressing strategy like quadratic probing or double hashing. These hashing strategies work by hashing items to some initial location k. If that slot is full, we look at slot (k + r) mod T, where T is the table size and r is some offset. If that slot is full, we then check (k + 2r) mod T, then (k + 3r) mod T, etc. If the table size is a prime number and r isn't zero, this has the nice, desirable property that these indices will cycle through all the different positions in the table without ever repeating, ensuring that items are nicely distributed over the table. With non-prime table sizes, it's possible that this strategy gets stuck cycling through a small number of slots, which gives less flexibility in positions and can cause insertions to fail well before the table fills up.

So assuming you aren't using double hashing or quadratic probing, and assuming you have a strong enough hash function, feel free to size your table however you'd like.



Related Topics



Leave a reply



Submit