Why Would Anyone Use Set Instead of Unordered_Set

Why would anyone use set instead of unordered_set?

When, for someone who wants to iterate over the items of the set, the order matters.

Should I use std::set or std::unordered_set for a set of pointers?

As Ami Tavory commented, if you don't need order, then it's usually best to go for unordered containers. The reason being that if order somehow improved performance, unordered containers would still be free to use it, and hence get the same or better complexity anyhow.

A downside of unordered collections is that they usually require a hash function for the key type. If it's too hard or expensive to make one, then containers which don't use hashes might be better.

In C++'s standard library, the average insertion complexity for std::set is O(log(N)), whereas for std::unordered_set it's O(1). Aside from that, there are probably less cache misses on average when using std::unordered_set.

At the end of the day though, this is just theory. You should try something that sounds good enough and profile it to see if it really is.

what is the difference between set and unordered_set in C++?

I think you've generally answered your own question, however, this:

Not as compact as tree. (for practical purposes load factors is never 1)

is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type T utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool). This may be 3 * pointer size depending on whether the tree contains a parent pointer for each tree node.

Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1 as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size.

Note that this analysis ignores any overhead which may come from extra space used by alignment.

For any element T which has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5 (for example) the std::unordered_set may indeed use up less memory than the equivalent std::set.

The other big missing point is the fact that iterating through a std::set is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set will return the values in a "random" order.

When to use std::unordered_set instead of std::set?

std::set sorts the elements while std::unordered_set leaves the elements as is. If you are required have all the elements sorted out, you should use std::set.

When is it worth it to change from std::vector to std::unordered_set?

There are many factors affecting the point where the std::vector falls behind other approaches. See std::vector faster than std::unordered_set? and Performance of vector sort/unique/erase vs. copy to unordered_set for some of the reasons why this is the case. As a consequence, any calculation of this point would have to be rather involved and complicated. The most expedient way to find this point is performance testing.

Keep in mind that some of the factors are based on the hardware being used. So not only do you need performance testing on your development machine, but also performance testing on "typical" end-user machines. Your development machine can give you a gut check, though, (as can an online tool like Quick Bench) letting you know that you are not even in the ballpark yet. (The intuition of individual programmers is notoriously unreliable. Some people see 100 as a big number and worry about performance; others don't worry until numbers hit the billions. Either of these people would be blown away by the others' view.)

Given the difficulty in determining the point where std::vector falls behind, I think this is a good place to give a reminder that premature optimization is usually a waste of time. Investigating this performance out of curiosity is fine, but until this is identified as a performance bottleneck, don't hold up work on more important aspects of your project for this. Choose the approach that fits your code best.

That being said, I personally would assume the break point is well, well over 10 items. So for the 5-element collection of the question, I would be inclined to use a vector and not look back.

For the operation count(). Which is faster std::setvoid* or std::unordered_setvoid*?

I think the important bit is in the small footnote:

The amount of elements is so few that even a std::vector would perform fine.

std::vector is more cache friendly than std::set and std::unordered_set, because it allocates its elements in a contiguous region of memory (this is mandated by the Standard).

And although lookup and insertion complexity is worse for std::vector than for std::set or std::unordered_set (linear versus O(log N) and amortized O(1), respectively), data locality and a higher cache hit rate is likely to be dominating the computational complexity and yield better performance.

In general, anyway, the impact on performance of the data structures you choose also depends on the kind of operations you want to perform on them and on their frequency - this is not mentioned in your post.

As always, however, measure the different alternatives before you commit to one, and always favor a simpler design when you do not have evidence that it represents a bottleneck preventing your application from meeting its performance requirements.



Related Topics



Leave a reply



Submit