.NET HashTable Vs Dictionary - Can the Dictionary be as fast?
System.Collections.Generic.Dictionary<TKey, TValue>
and System.Collections.Hashtable
classes both maintain a hash table data structure internally. None of them guarantee preserving the order of items.
Leaving boxing/unboxing issues aside, most of the time, they should have very similar performance.
The primary structural difference between them is that Dictionary
relies on chaining (maintaining a list of items for each hash table bucket) to resolve collisions whereas Hashtable
uses rehashing for collision resolution (when a collision occurs, tries another hash function to map the key to a bucket).
There is little benefit to use Hashtable
class if you are targeting for .NET Framework 2.0+. It's effectively rendered obsolete by Dictionary<TKey, TValue>
.
Simple hashmap implementation in C++
Most compilers should define std::hash_map
for you; in the coming C++0x
standard, it will be part of the standard library as std::unordered_map
. The STL Page on it is fairly standard. If you use Visual Studio, Microsoft has a page on it.
If you want to use your class as the value, not as the key, then you don't need to do anything special. All primitive types (things like int
, char
, bool
and even char *
) should "just work" as keys in a hash_map
. However, for anything else you will have to define your own hashing and equality functions and then write "functors" that wrap them in a class.
Assuming your class is called MyClass
and you have already defined:
size_t MyClass::HashValue() const { /* something */ }
bool MyClass::Equals(const MyClass& other) const { /* something */ }
You will need to define two functors to wrap those methods in objects.
struct MyClassHash {
size_t operator()(const MyClass& p) const {
return p.HashValue();
}
};
struct MyClassEqual {
bool operator()(const MyClass& c1, const MyClass& c2) const {
return c1.Equals(c2);
}
};
And instantiate your hash_map
/hash_set
as:
hash_map<MyClass, DataType, MyClassHash, MyClassEqual> my_hash_map;
hash_set<MyClass, MyClassHash, MyClassEqual> my_hash_set;
Everything should work as expected after that.
C++ std::unordered_map fastest way to insert new element only if it doesn't exist
This is what try_emplace
is for. The API is set up in such a way as to avoid constructing nodes, or even the value, unless it's strictly necessary. It comes form N4279.
In your case, that would be:
auto [it, success] = map.try_emplace(key, largeObject);
Each of the four options in the OP has issues:
map[key] = largeObject
doesn't actually do what you're asking for, it would overwrite the existing item. And even if it wasn't there, it requires default construction and copy assignment.The approaches with
count
andfind
both require two lookups.map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject));
is a single lookup but requires constructing the large object, and the pair, unconditionally.
Not mentioned in the OP is another option: map.emplace(key, largeObject);
This has the issue that it's actually under-specified whether or not the pair
is created in the case that the key
exists. It does on some implementations. The motivation for try_emplace
was to properly specify the API so that the pair
definitely does not get created if the key
already exists.
What are the differences between a HashMap and a Hashtable in Java?
There are several differences between HashMap
and Hashtable
in Java:
Hashtable
is synchronized, whereasHashMap
is not. This makesHashMap
better for non-threaded applications, as unsynchronized Objects typically perform better than synchronized ones.Hashtable
does not allownull
keys or values.HashMap
allows onenull
key and any number ofnull
values.One of HashMap's subclasses is
LinkedHashMap
, so in the event that you'd want predictable iteration order (which is insertion order by default), you could easily swap out theHashMap
for aLinkedHashMap
. This wouldn't be as easy if you were usingHashtable
.
Since synchronization is not an issue for you, I'd recommend HashMap
. If synchronization becomes an issue, you may also look at ConcurrentHashMap
.
Is it always necessary to make hash table number of buckets a prime number for performance reason?
The answer is "usually you don't need a table whose size is a prime number, but there are some implementation reasons why you might want to do this."
Fundamentally, hash tables work best when hash codes are spread out as close to uniformly at random as possible. That prevents items from clustering in any one location within the table. At some level, provided that you have a good enough hash function to make this happen, the size of the table doesn't matter.
So why do folks say to pick tables whose size is a prime? There are two main reasons for this, and they're due to specific cases that don't arise in all hash tables.
One reason why you sometimes see prime-sized tables is due to a specific way of building hash functions. You can build reasonable hash functions by picking functions of the form h(x) = (ax + b) mod p, where a is a number in {1, 2, ..., p-1} and b is a number in the {0, 1, 2, ..., p-1}, assuming that p is a prime. If p isn't prime, hash functions of this form don't spread items out uniformly. As a result, if you're using a hash function like this one, then it makes sense to pick a table whose size is a prime number.
The second reason you see advice about prime-sized tables is if you're using an open-addressing strategy like quadratic probing or double hashing. These hashing strategies work by hashing items to some initial location k. If that slot is full, we look at slot (k + r) mod T, where T is the table size and r is some offset. If that slot is full, we then check (k + 2r) mod T, then (k + 3r) mod T, etc. If the table size is a prime number and r isn't zero, this has the nice, desirable property that these indices will cycle through all the different positions in the table without ever repeating, ensuring that items are nicely distributed over the table. With non-prime table sizes, it's possible that this strategy gets stuck cycling through a small number of slots, which gives less flexibility in positions and can cause insertions to fail well before the table fills up.
So assuming you aren't using double hashing or quadratic probing, and assuming you have a strong enough hash function, feel free to size your table however you'd like.
Related Topics
In C++, Is It Still Bad Practice to Return a Vector from a Function
Differencebetween Wm_Quit, Wm_Close, and Wm_Destroy in a Windows Program
In C++11, Does 'I += ++I + 1' Exhibit Undefined Behavior
Why Does the Most Negative Int Value Cause an Error About Ambiguous Function Overloads
Why Reference Size Is Always 4 Bytes - C++
Inaccessible Direct Base' Caused by Multiple Inheritance
Detect When Multiple Enum Items Map to Same Value
Get Base Class for a Type in Class Hierarchy
Rounding Up and Down a Number C++
How to Use Glortho() in Opengl
Fast Fixed Point Pow, Log, Exp and Sqrt
Passing Integers as Constant References Versus Copying
How to Write C++ Code Without Headers (Repetitive Function Declarations)
Acquire/Release Semantics with Non-Temporal Stores on X64
How to Test Whether a C++ Class Has a Default Constructor (Other Than Compiler-Provided Type Traits)