What's the Best Hashing Algorithm to Use on a Stl String When Using Hash_Map

Fast String Hashing Algorithm with low collision rates with 32 bit integer

One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs.

How to use hash_map with char* and do string compare?

Well, std::strcmp is defined by C++ when you do #include <cstring>. The example in SGI's hash_map doc provides a strcmp-based example of making your own equality-testing function for char*'s (quoting from beginning of the SGI doc):

struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};

I have to say I agree with the author of the link in your post, where he says that it is already a mistake for hash_map<char*> to use by default a string-based hash<char*>. But I usually use hash_maps (or, lately, boost::unordered_maps) on C++ std::strings for this kind of thing anyway.

Good Hash Function for Strings

Usually hashes wouldn't do sums, otherwise stop and pots will have the same hash.

and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.

Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:

int hash = 7;
for (int i = 0; i < strlen; i++) {
hash = hash*31 + charAt(i);
}

Should I cache the hash code of an STL string used as a hash key?

I don't have experience with caching hash codes, but I've done some work recently converting std::map to std::tr1::unordered_map. Two thoughts come to mind. First, try profiling that relatively simple change first, because it sometimes makes things worse, depending on what your code is doing. It might give you enough speedup on its own before you try optimizing further. Secondly, what does your profiler say about the other 90% of your initialization time? Even if you optimized the global dictionary stuff down to 0 time, you will at most improve performance by 10%.

What is the best way to use a HashMap in C++?

The standard library includes the ordered and the unordered map (std::map and std::unordered_map) containers. In an ordered map (std::map) the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map (std::unordered_map) insert and access is in O(1). It is just another name for a hashtable.

An example with (ordered) std::map:

#include <map>
#include <iostream>
#include <cassert>

int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}

Output:


23
Key: hello Value: 23

If you need ordering in your container and are fine with the O(log n) runtime then just use std::map.

Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map, which has a similar to std::map API (e.g. in the above example you just have to search and replace map with unordered_map).

The unordered_map container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS).

Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1. Thus, for old GCC compilers you can try to use it like this:

#include <tr1/unordered_map>

std::tr1::unordered_map<std::string, int> m;

It is also part of boost, i.e. you can use the corresponding boost-header for better portability.

Collision detection in STL's hash_map

Assuming you've got the full STL, it actually includes a hash function, hash<T>, which in its included form is suitable for a few different key types including char* (C strings). I don't know details of its performance, but the STL is generally engineered to have acceptable performance for most applications.

As for collisions, that's for hash_map to deal with, you needn't worry about it.

stdext::hash_map unclear hash function

Why order is do not match to insert order?

That's because a stdext::hash_map (and the platform-independent standard library version std::unordered_map from C++11) doesn't maintain/guarantee any reasonable order of its elements, not even insertion order. That's because it is a hashed container, with the individual elements' position based on their hash value and the size of the container. So you won't be able to maintain a reasonable order for your data with such a container.

What you can use to keep your elements in a guaranteed order is a good old std::map. But this also doesn't order elements by insertion order, but by the order induced by the comparison predicate (which can be confugured to respect insertion time, but that would be quite unintuitive and not that easy at all).

For anything else you won't get around rolling your own (or search for other libraries, don't know if boost has something like that). For example add all elements to a linear std::vector/std::list for insertion order iteration and maintain an additional std::(unordered_)map pointing into that vector/list for O(1)/O(log n) retrieval if neccessary.



Related Topics



Leave a reply



Submit