std::lower_bound slower for std::vector than std::map::find
This is a somewhat more interesting nut to crack! Before discussing my findings so far, let me point out that the associative::find()
function behaves differently to std::map::find()
: if the key isn't found the former returns the lower bound while the latter returns end()
. To fix this, associative::find()
needs to be changed to become something like this:
auto rc = std::lower_bound(this->internal_.begin(), this->internal_.end(), key, this->comp_);
return rc != this->internal_.end() && !this->comp_(key, rc->first)? rc: this->internal_.end();
Now that we are more likely to compare apples to apples (I haven't verified if the logic is really correct now), let's go on to investigate the performance. I'm not quite convinced that the approach used to test performance really hold water but I'm sticking with it for now and I could definitely improve the performance of the associative
container. I don't think I have quite found all performance issues in the code but, at least, made some progress. The biggest is to noticed that the comparison function used in the associative
is pretty bad because it keeps making copies. This is putting this container somewhat at a disadvantage. If you are checking the comparator now you probably don't see it because it looks as if this comparator is passing by reference! The issue is actually rather subtle: the underlying container has a value_type
of std::pair<key_type, mapped_type>
but the comparator is taking std::pair<key_type const, mapped_type>
as argument! Fixing this seems to give the associative container quite a bit of a performance boost.
To implement a comparator class which has not opportunity to fail matching the arguments exactly I'm using a simple helper to detect if a type is a std::pair<L, R>
:
template <typename> struct is_pair { enum { value = false }; };
template <typename F, typename S> struct is_pair<std::pair<F, S>> { enum { value = true }; };
... and then I replaced the comparator with this, slightly more complicated, one:
class value_compare {
key_compare pred_;
public:
inline value_compare(key_compare pred=key_compare()) : pred_(pred) {}
template <typename L, typename R>
inline typename std::enable_if<is_pair<L>::value && is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left.first,right.first);
}
template <typename L, typename R>
inline typename std::enable_if<is_pair<L>::value && !is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left.first,right);
}
template <typename L, typename R>
inline typename std::enable_if<!is_pair<L>::value && is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left,right.first);
}
template <typename L, typename R>
inline typename std::enable_if<!is_pair<L>::value && !is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left,right);
}
inline key_compare key_comp( ) const {return pred_;}
};
This generally gets the two approaches quite a bit closer together. Given that I would expect that the std::vector<T>
with lower_bound()
approach should be a lot better than using std::map<K, T>
I feel the investigation isn't over, yet.
Addendum:
Rethinking the exercise a bit more I spotted why I felt uncomfortable with the implementation of the predicate class: it is way to complex! This can be done a lot simpler by not using std::enable_if
for a change: this nicely reduces the code to something which is much easier to read. The key is to get the key:
template <typename Key>
Key const& get_key(Key const& value) { return value; }
template <typename Key, typename Value>
Key const& get_key(std::pair<Key, Value> const& pair) { return pair.first; }
With this implementation to get hold of a "key" from either a value or a pair of values, the predicate object can just define one very simple function call operator:
template <typename L, typename R>
bool operator()(L const& l, R const& r)
{
return this->pred_(get_key<key_type>(l), get_key<key_type>(r));
}
There is a slight trick in this as well, though: the expected key_type
needs to be passed to the get_key()
function. Without this the predicate wouldn't work in cases where the key_type
is itself a std::pair<F, S>
of objects.
Binary search of range in std::map slower than map::find() search of whole map
There is a reason that std::map::find()
exists. The implementation already does a binary search, as the std::map
has a balanced binary tree as implementation.
Your implementation of binary search is much slower because you can't take advantage of that binary tree.
If you want to take the middle of the map, you start with std::advance
it takes the first node (which is at the leaf of the tree) and navigates through several pointers towards what you consider to be the middle. Afterwards, you again need to go from one of these leaf nodes to the next. Again following a lot of pointers.
The result: next to a lot more looping, you get a lot of cache misses, especially when the map is large.
If you want to improve the lookups in your map, I would recommend using a different structure. When ordering ain't important, you could use std::unordered_map
. When order is important, you could use a sorted std::vector<std::pair<Key, Value>>
. In case you have boost available, this already exists in a class called boost::container::flat_map
.
Why is vector faster than map in one test, but not the other?
I see some problems with the the code ( http://ideone.com/41iKt ) you posted on ideone.com. (ideone actually shows vector
as faster, but a local build with the Visual Studio 11 Developer Preview shows map
faster).
First I moved the map variable and used it to initialize the vector to get the same element ordering and uniquing, and then I gave lower_bound
a custom comparator that only compares first
, since that's what map will be doing. After these changes Visual Studio 11 shows the vector as faster for the same 100,000 elements (although the ideone time doesn't change significantly). http://ideone.com/b3OnH
With test_size
reduced to 8 map is still faster. This isn't surprising because this is the way algorithm complexity works, all the constants in the function that truly describes the run time matter with small N. I have to raise test_size
to about 2700 for vector to pull even and then ahead of map on this system.
C++ STL Map vs Vector speed
You effectively have a number of alternatives.
Libraries exist:
- Loki::AssocVector: the interface of a map implemented over a
vector
of pairs, faster than a map for small or frozen sets because of cache locality. - Boost.MultiIndex: provides both List with fast lookup and an example of implementing a MRU List (Most Recently Used) which caches the last accessed elements.
Critics
- Map look up and retrieval take
O(log N)
, but the items may be scattered throughout the memory, thus not playing well with caching strategies. - Vector are more cache friendly, however unless you sort it you'll have
O(N)
performance onfind
, is it acceptable ? - Why not using a
unordered_map
? They provideO(1)
lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tables you'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.
Why is a binary search over a sorted vector is slower than std::set find?
For the updated question:
-O1
and -O2
gives the same performance for the two methods.
-Ox
slows down the vector version.
Why this is, you need to look at the disassembly or at the details of the -Ox
level. It has nothing to do with the algorithmic properties of set.find
and lower_bound/binary_search
.
Regarding the locality of data. A binary_search and a set::find has for reasonable implementations exactly the same locality of data. The set might even win with the data being read in a left-to-right fashion.
Searching std::map in O(n) for a partial key
In c++14 you can use the overload to search on a partial key:
struct CompareFirstTwo {
using is_transparent = void;
bool operator()(const tuple<A, B, C>& lhs, const tuple<A, B, C>& rhs) const ...
bool operator()(const tuple<A, B>& lhs, const tuple<A, B, C>& rhs) const ...
bool operator()(const tuple<A, B, C>& lhs, const tuple<A, B>& rhs) const ...
};
Use the comparator above in a call to equal_range
to ignore the third field in the tuple.
How to find an element in a specified range in std::map?
You can use std::lower_bound
, std::upper_bound
or std::equal_range
for that as std::map
iterators and data in the map satisfy the requirement for those functions, though you should be aware that it will be less efficient than std::map::find()
due to linear iterator increments.
From std::lower_bound
documentation
The number of comparisons performed is logarithmic in the distance between first and last (At most log
2(last - first) + O(1) comparisons). However, for non-LegacyRandomAccessIterators, the number of iterator increments is linear.
emphasis is mine.
When to choose std::vector over std::map for key-value data?
I only rarely use std::vector
with a linear search (except in conjunction with binary searching as described below). I suppose for a small enough amount of data it would be better, but with that little data it's unlikely that anything is going to provide a huge advantage.
Depending on usage pattern, a binary search on an std::vector
can make sense though. A std::map
works well when you need to update the data regularly during use. In quite a few cases, however, you load up some data and then you use the data -- but after you've loaded the data, it mostly remains static (i.e., it changes very little, if at all).
In this case, it can make a lot of sense to load the data into a vector, sort it if necessary, and then do binary searches on the data (e.g. std::lower_bound
, std::equal_range
). This gives pretty much the best of both worlds -- low-complexity binary searches and good cache usage from high locality of reference (i.e., the vector is contiguous, as opposed to the linked structure of a std::map
). The shortcoming, of course, is that insertions and deletions are slow -- but this is one time I have used your original idea -- store newly inserted data separately until it reaches some limit, and only then sort it in with the rest of the data, so a single search consists of a binary search of the main body of the data, followed by a linear search of the (small amount) of newly inserted data.
Finding nearest point with lower-bound... but data is not sorted
copy your vector
into an stl set
and then use the same logic as you have done for vector for finding the element.
Related Topics
Const Member and Assignment Operator. How to Avoid the Undefined Behavior
Using Std::Variant with Recursion, Without Using Boost::Recursive_Wrapper
Value-Initializing an Automatic Object
Returning VS. Using a Reference Parameter
Constructor as a Function Try Block - Exception Aborts Program
How to Initialize All Tuple Elements by the Same Arguments
Multithreading on Dual Core MAChine
Linux Serial Port Reading - How to Change Size of Input Buffer
Lnk2022 Metadata Operation: Inconsistent Layout Information in Duplicated Types
Forcing a Constant Expression to Be Evaluated During Compile-Time
Ubuntu System Monitor and Valgrind to Discover Memory Leaks in C++ Applications
Adding Quotes to Argument in C++ Preprocessor
Getting Actual File Name (With Proper Casing) on Windows
Visual Studio Compiler Warning C4250 ('Class1':Inherits 'Class2::Member' via Dominance)
Breaking Readfile() Blocking - Named Pipe (Windows API)
Why Are Override and Final Identifiers with Special Meaning Instead of Reserved Keywords