Choosing Between Std::Map and Std::Unordered_Map

Choosing between std::map and std::unordered_map

As already mentioned, map allows to iterate over the elements in a sorted way, but unordered_map does not. This is very important in many situations, for example displaying a collection (e.g. address book). This also manifests in other indirect ways like: (1) Start iterating from the iterator returned by find(), or (2) existence of member functions like lower_bound().

Also, I think there is some difference in the worst case search complexity.

For map, it is O( lg N )
For unordered_map, it is O( N ) [This may happen when the hash function is not good leading to too many hash collisions.]

The same is applicable for worst case deletion complexity.

Is there any advantage of using map over unordered_map in case of trivial keys?

Don't forget that map keeps its elements ordered. If you can't give that up, obviously you can't use unordered_map.

Something else to keep in mind is that unordered_map generally uses more memory. map just has a few house-keeping pointers, and memory for each object. Contrarily, unordered_map has a big array (these can get quite big in some implementations), and then additional memory for each object. If you need to be memory-aware, map should prove better, because it lacks the large array.

So, if you need pure lookup-retrieval, I'd say unordered_map is the way to go. But there are always trade-offs, and if you can't afford them, then you can't use it.

Just from personal experience, I found an enormous improvement in performance (measured, of course) when using unordered_map instead of map in a main entity look-up table.

On the other hand, I found it was much slower at repeatedly inserting and removing elements. It's great for a relatively static collection of elements, but if you're doing tons of insertions and deletions the hashing + bucketing seems to add up. (Note, this was over many iterations.)

How to choose between map and unordered_map?

In practice, if memory is no issue, unordered_map is always faster if you want single element access.

The worst case is theoretical and bound to a single hash accounting for all of the elements. This is not of practical relevance. The unordered_map gets slower as soon as you have at least log N elements belonging to the same hash. This is also not of practical relevance. In some special scenarios you could use a specific hashing algorithm that ensures a more uniform distribution. For ordinary strings that don't share a specific pattern, the generic hash functions coming with unordered_map are just as good.

If you want to traverse the map (using iterators) in a sorted fashion, you cannot use unordered_map. On the contrary, map not only allows that, but also can provide you with the next element in a map based on an approximation of the key (see lower_bound and upper_bound methods).

In practice, when `std::unordered_map` must be used instead of `std::map`?

I know the difference between them, say internal implementation,time complexity for searching element

In that case, you should know that that the average asymptotic element lookup time complexity of unordered map is constant, while the complexity of ordered map is logarithmic. This means that there is some size of container at which point the lookup will be faster when using unordered map.

But I really can't find a circumstance where std::unordered_map could not be replaced by std::map indeed.

If the container is large enough, and if you cannot afford the cost of ordered map lookup, then you cannot choose to replace the faster unordered map lookup.

Another case where ordered map cannot be used is where there doesn't exist a cheap function to compare relative order of the key.

map vs unordered_map for few elements

Under the premise that you always need to measure in order to figure out what's more appropriate in terms of performance, if all these things are true:

Changes to the map are not frequent;
The map contains a maximum of 10 elements;
Lookups will be frequent;
You care a lot about performance;

Then I would say you would be better off putting your elements in an std::vector and performing a plain iteration over all your elements to find the one you're looking for.

An std::vector will allocate its elements in a contiguous region of memory, so cache locality is likely to grant you a greater performance - the time required to fetch a cache line from main memory after a cache miss is at least one order of magnitude higher than the time required to access the CPU cache.

Quite interestingly, it seems like Boost's flat_map is ideal for your use case (courtesy of Praetorian):

flat_map is similar to std::map but it's implemented like an ordered vector. (from the online documentation)

So if using Boost is an option for you, you may want to try this one.

Choice Between std::vector and std::unordered_map For Searching In Few Items Case?

If you are not going to change your data and you need to do many searches I suggest you to try to use a std:vector and sort it. Then you can use a find algorithm such as binary_search, lower_bound or upper_bound from the STL exploiting the fact that the container is sorted.

You get the best: both locality and O(log(N)) complexity.

which container from std::map or std::unordered_map is suitable for my case

You shouldn't need to understand how red-black trees work in order to understand how to use a std::map. It's simply an associative array where the keys are in order (lexicographical order, in the case of string keys, at least with the default comparison function). That means that you can not only look keys up in a std::map, you can also make queries which depend on order. For example, you can find the largest key in the map which is not greater than the key you have. You can find the next larger key. Or (again in the case of strings) you can find all keys which start with the same prefix.

If you iterate over all the key-value pairs in a std::map, you will see them in order by key. That can be very useful, sometimes.

The extra functionality comes at a price. std::map is usually slower than std::unordered_map (though not always; for large string keys, the overhead of computing the hash function might be noticeable), and the underlying data structure has a certain amount of overhead, so they may occupy more space. The usual advice is to use a std::map if you find the fact that the keys are ordered to be essential or even useful.

But if you've benchmarked and concluded that for your application, a std::map is also faster, then go ahead and use it :)

It is occasionally useful to have a map whose mapped type is bool, but only if you need to distinguish between keys whose corresponding value is false and keys which are not present in the map at all. In effect, a std::map<T, bool> (or std::unordered_map<T, bool>) provides a ternary choice for each possible key.

If you don't need to distinguish between the two false cases, and you don't frequently change a key's value, then you may well be better off with a std::set (or std::unordered_set), which is exactly the same datastructure but without the overhead of the bool in each element. (Although only one bit of the bool is useful, alignment considerations may end up using 8 additional bytes for each entry.) Other than storage space, there won't be much (if any) performance difference, though.

If you do really need a ternary case, then you would be well-advised to make the value an enum rather than a bool. What do true and false mean in the context of your usage? My guess is that they don't mean "true" and "false". Instead, they mean something like "is an attribute path" and "is an element path". That distinction could be made much clearer (and therefore less accident-prone) by using enum PathType {ATTRIBUTE_PATH, ELEMENT_PATH};. That will not involve any additional resources, since the bool is occupying eight bytes of storage in any case (because of alignment).

By the way, there is no guarantee that the underlying data structure is precisely a red-black tree, although the performance guarantees would be difficult to achieve without some kind of self-balancing tree. I don't know of such an implementation, but it would be possible to use k-ary trees (for some small k) to take advantage of SIMD vector comparison operations, for example. Of course, that would need to be customized for appropriate key types.

If you do want to understand red-black trees, you could do worse than Robert Sedgewick's standard textbook on Algorithms. On the book's website, you'll find a brief illustrated explanation in the chapter on balanced trees.

Choosing Between Std::Map and Std::Unordered_Map