How to Choose Between Map and Unordered_Map

How to choose between map and unordered_map?

In practice, if memory is no issue, unordered_map is always faster if you want single element access.

The worst case is theoretical and bound to a single hash accounting for all of the elements. This is not of practical relevance. The unordered_map gets slower as soon as you have at least log N elements belonging to the same hash. This is also not of practical relevance. In some special scenarios you could use a specific hashing algorithm that ensures a more uniform distribution. For ordinary strings that don't share a specific pattern, the generic hash functions coming with unordered_map are just as good.

If you want to traverse the map (using iterators) in a sorted fashion, you cannot use unordered_map. On the contrary, map not only allows that, but also can provide you with the next element in a map based on an approximation of the key (see lower_bound and upper_bound methods).

Is there any advantage of using map over unordered_map in case of trivial keys?

Don't forget that map keeps its elements ordered. If you can't give that up, obviously you can't use unordered_map.

Something else to keep in mind is that unordered_map generally uses more memory. map just has a few house-keeping pointers, and memory for each object. Contrarily, unordered_map has a big array (these can get quite big in some implementations), and then additional memory for each object. If you need to be memory-aware, map should prove better, because it lacks the large array.

So, if you need pure lookup-retrieval, I'd say unordered_map is the way to go. But there are always trade-offs, and if you can't afford them, then you can't use it.

Just from personal experience, I found an enormous improvement in performance (measured, of course) when using unordered_map instead of map in a main entity look-up table.

On the other hand, I found it was much slower at repeatedly inserting and removing elements. It's great for a relatively static collection of elements, but if you're doing tons of insertions and deletions the hashing + bucketing seems to add up. (Note, this was over many iterations.)

Choosing between std::map and std::unordered_map

As already mentioned, map allows to iterate over the elements in a sorted way, but unordered_map does not. This is very important in many situations, for example displaying a collection (e.g. address book). This also manifests in other indirect ways like: (1) Start iterating from the iterator returned by find(), or (2) existence of member functions like lower_bound().

Also, I think there is some difference in the worst case search complexity.

For map, it is O( lg N )
For unordered_map, it is O( N ) [This may happen when the hash function is not good leading to too many hash collisions.]

The same is applicable for worst case deletion complexity.

map vs unordered_map for few elements

Under the premise that you always need to measure in order to figure out what's more appropriate in terms of performance, if all these things are true:

Changes to the map are not frequent;
The map contains a maximum of 10 elements;
Lookups will be frequent;
You care a lot about performance;

Then I would say you would be better off putting your elements in an std::vector and performing a plain iteration over all your elements to find the one you're looking for.

An std::vector will allocate its elements in a contiguous region of memory, so cache locality is likely to grant you a greater performance - the time required to fetch a cache line from main memory after a cache miss is at least one order of magnitude higher than the time required to access the CPU cache.

Quite interestingly, it seems like Boost's flat_map is ideal for your use case (courtesy of Praetorian):

flat_map is similar to std::map but it's implemented like an ordered vector. (from the online documentation)

So if using Boost is an option for you, you may want to try this one.

choice between map or unordered_map for keys consisting of calculated double values.

"is it possible to write a Hash function for four doubles (or even just a regular double) that takes a comparison error tolerance into account."

No, it is not.

That sounds like a very definite statement, how can I be so sure?

Let's assume you want a tolerance of 0.00001. The value doesn't matter, we're just going to use it for an example. That will mean that for this hash function:

hash(1.0) must return the same value as hash(1.00001)

so that they can be considered equal. But it also means:

hash(1.00001) must return the same value as hash(1.00002)

for the same reason, and

hash(1.00002) must return the same value as hash(1.00003)

...and so on, up to the topmost possible value of double - effectively ad infinitum. The same is true for values below 1.

So any hash function that allows for tolerance will have to return the same hash for all values, making it useless.

P.S. to actually recommend an approach that does work, the four-dimensional quadtree (technically something like sedecimtree) is probably best.

In practice, when `std::unordered_map` must be used instead of `std::map`?

I know the difference between them, say internal implementation,time complexity for searching element

In that case, you should know that that the average asymptotic element lookup time complexity of unordered map is constant, while the complexity of ordered map is logarithmic. This means that there is some size of container at which point the lookup will be faster when using unordered map.

But I really can't find a circumstance where std::unordered_map could not be replaced by std::map indeed.

If the container is large enough, and if you cannot afford the cost of ordered map lookup, then you cannot choose to replace the faster unordered map lookup.

Another case where ordered map cannot be used is where there doesn't exist a cheap function to compare relative order of the key.

Selection of map or unordered_map based on keys's type

You raise a good point... but you are focusing on the wrong part.

The problem is not the type of the key, per se, but on the hash function that is used to derive a hash value for that key.

Lexicographical ordering is easy: if you tell me you want to order a structure according to its 3 fields (and they already support ordering themselves) then I'll just write:

bool operator<(Struct const& left, Struct const& right) {
    return boost::tie(left._1,  left._2,  left._3)
         < boost::tie(right._1, right._2, right._3);
}

And I am done!

However writing a hash function is difficult. You need some knowledge about the distribution of your data (statistics), you might need to prevent specially crafted attacks, etc... Honestly, I do not expect many people of being able to craft a good hash function. But the worst part is, composition is difficult too! Given two independent fields, combining their hash value right is hard (hint: boost::hash_combine).

So, indeed, if you have no idea what you are doing and you are treating user-crafted data, just stick to a map. It's maybe slower (not sure), but it's safer.

Which one is better stl map or unordered_map for the following cases

Which one performs faster in Insert, Delete, Look-up? Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!

For a specific use, you should try both with your actual data and usage patterns and see which is actually faster... there are enough factors that it's dangerous to assume either will always "win".

implementation and characteristics of unordered maps / hash tables

Academically - as the number of elements increases towards infinity, those operations on an std::unordered_map (which is the C++ library offering for what Computing Science terms a "hash map" or "hash table") will tend to continue to take the same amount of time O(1) (ignoring memory limits/caching etc.), whereas with a std::map (a balanced binary tree) each time the number of elements doubles it will typically need to do an extra comparison operation, so it gets gradually slower O(log₂n).

std::unordered_map implementations necessarily use open hashing: the fundamental expectation is that there'll be a contiguous array of "buckets", each logically a container of any values hashing thereto.

It generally serves to picture the hash table as a vector<list<pair<key,value>>> where getting from the vector elements to a value involves at least one pointer dereference as you follow the list-head-pointer stored in the bucket to the initial list node; the insert/find/delete operations' performance depends on the size of the list, which on average equals the unordered_map's load_factor.

If the max_load_factor is lowered (the default is 1.0), then there will be less collisions but more reallocation/rehashing during insertion and more wasted memory (which can hurt performance through increased cache misses).

The memory usage for this most-obvious of unordered_map implementations involves both the contiguous array of bucket_count() list-head-iterator/pointer-sized buckets and one doubly-linked list node per key/value pair. Typically, bucket_count() + 2 * size() extra pointers of overhead, adjusted for any rounding-up of dynamic memory allocation request sizes the implementation might do. For example, if you ask for 100 bytes you might get 128 or 256 or 512. An implementation's dynamic memory routines might use some memory for tracking the allocated/available regions too.

Still, the C++ Standard leaves room for real-world implementations to make some of their own performance/memory-usage decisions. They could, for example, keep the old contiguous array of buckets around for a while after allocating a new larger array, so rehashing values into the latter can be done gradually to reduce the worst-case performance at the cost of average-case performance as both arrays are consulted during operations.

implementation and characteristics of maps / balanced binary trees

A map is a binary tree, and can be expected to employ pointers linking distinct heap memory regions returned by different calls to new. As well as the key/value data, each node in the tree will need parent, left, and right pointers (see wikipedia's binary tree article if lost).

comparison

So, both unordered_map and map need to allocate nodes for key/value pairs with the former typically having two-pointer/iterator overhead for prev/next-node linkage, and the latter having three for parent/left/right. But, the unordered_map additionally has the single contiguous allocation for bucket_count() buckets (== size() / load_factor()).

For most purposes that's not a dramatic difference in memory usage, and the deallocation time difference for one extra region is unlikely to be noticeable.

another alternative

For those occasions when the container's populated up front then repeatedly searched without further inserts/erases, it can sometimes be fastest to use a sorted vector, searched using Standard algorithms binary_search, equal_range, lower_bound, upper_bound. This has the advantage of a single contiguous memory allocation, which is much more cache friendly. It always outperforms map, but unordered_map may still be faster - measure if you care.

How to Choose Between Map and Unordered_Map