Std::Lower_Bound Slower for Std::Vector Than Std::Map::Find

std::lower_bound slower for std::vector than std::map::find

This is a somewhat more interesting nut to crack! Before discussing my findings so far, let me point out that the associative::find() function behaves differently to std::map::find(): if the key isn't found the former returns the lower bound while the latter returns end(). To fix this, associative::find() needs to be changed to become something like this:

auto rc = std::lower_bound(this->internal_.begin(), this->internal_.end(), key, this->comp_);
return rc != this->internal_.end() && !this->comp_(key, rc->first)? rc: this->internal_.end();

Now that we are more likely to compare apples to apples (I haven't verified if the logic is really correct now), let's go on to investigate the performance. I'm not quite convinced that the approach used to test performance really hold water but I'm sticking with it for now and I could definitely improve the performance of the associative container. I don't think I have quite found all performance issues in the code but, at least, made some progress. The biggest is to noticed that the comparison function used in the associative is pretty bad because it keeps making copies. This is putting this container somewhat at a disadvantage. If you are checking the comparator now you probably don't see it because it looks as if this comparator is passing by reference! The issue is actually rather subtle: the underlying container has a value_type of std::pair<key_type, mapped_type> but the comparator is taking std::pair<key_type const, mapped_type> as argument! Fixing this seems to give the associative container quite a bit of a performance boost.

To implement a comparator class which has not opportunity to fail matching the arguments exactly I'm using a simple helper to detect if a type is a std::pair<L, R>:

template <typename>               struct is_pair                  { enum { value = false }; };
template <typename F, typename S> struct is_pair<std::pair<F, S>> { enum { value = true }; };

... and then I replaced the comparator with this, slightly more complicated, one:

class value_compare {
key_compare pred_;
public:
inline value_compare(key_compare pred=key_compare()) : pred_(pred) {}
template <typename L, typename R>
inline typename std::enable_if<is_pair<L>::value && is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left.first,right.first);
}
template <typename L, typename R>
inline typename std::enable_if<is_pair<L>::value && !is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left.first,right);
}
template <typename L, typename R>
inline typename std::enable_if<!is_pair<L>::value && is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left,right.first);
}
template <typename L, typename R>
inline typename std::enable_if<!is_pair<L>::value && !is_pair<R>::value, bool>::type
operator()(L const& left, R const& right) const {
return pred_(left,right);
}
inline key_compare key_comp( ) const {return pred_;}
};

This generally gets the two approaches quite a bit closer together. Given that I would expect that the std::vector<T> with lower_bound() approach should be a lot better than using std::map<K, T> I feel the investigation isn't over, yet.

Addendum:

Rethinking the exercise a bit more I spotted why I felt uncomfortable with the implementation of the predicate class: it is way to complex! This can be done a lot simpler by not using std::enable_if for a change: this nicely reduces the code to something which is much easier to read. The key is to get the key:

template <typename Key>
Key const& get_key(Key const& value) { return value; }
template <typename Key, typename Value>
Key const& get_key(std::pair<Key, Value> const& pair) { return pair.first; }

With this implementation to get hold of a "key" from either a value or a pair of values, the predicate object can just define one very simple function call operator:

template <typename L, typename R>
bool operator()(L const& l, R const& r)
{
return this->pred_(get_key<key_type>(l), get_key<key_type>(r));
}

There is a slight trick in this as well, though: the expected key_type needs to be passed to the get_key() function. Without this the predicate wouldn't work in cases where the key_type is itself a std::pair<F, S> of objects.

Binary search of range in std::map slower than map::find() search of whole map

There is a reason that std::map::find() exists. The implementation already does a binary search, as the std::map has a balanced binary tree as implementation.

Your implementation of binary search is much slower because you can't take advantage of that binary tree.
If you want to take the middle of the map, you start with std::advance it takes the first node (which is at the leaf of the tree) and navigates through several pointers towards what you consider to be the middle. Afterwards, you again need to go from one of these leaf nodes to the next. Again following a lot of pointers.

The result: next to a lot more looping, you get a lot of cache misses, especially when the map is large.

If you want to improve the lookups in your map, I would recommend using a different structure. When ordering ain't important, you could use std::unordered_map. When order is important, you could use a sorted std::vector<std::pair<Key, Value>>. In case you have boost available, this already exists in a class called boost::container::flat_map.

Why is vector faster than map in one test, but not the other?

I see some problems with the the code ( http://ideone.com/41iKt ) you posted on ideone.com. (ideone actually shows vector as faster, but a local build with the Visual Studio 11 Developer Preview shows map faster).

First I moved the map variable and used it to initialize the vector to get the same element ordering and uniquing, and then I gave lower_bound a custom comparator that only compares first, since that's what map will be doing. After these changes Visual Studio 11 shows the vector as faster for the same 100,000 elements (although the ideone time doesn't change significantly). http://ideone.com/b3OnH

With test_size reduced to 8 map is still faster. This isn't surprising because this is the way algorithm complexity works, all the constants in the function that truly describes the run time matter with small N. I have to raise test_size to about 2700 for vector to pull even and then ahead of map on this system.

C++ STL Map vs Vector speed

You effectively have a number of alternatives.

Libraries exist:

  • Loki::AssocVector: the interface of a map implemented over a vector of pairs, faster than a map for small or frozen sets because of cache locality.
  • Boost.MultiIndex: provides both List with fast lookup and an example of implementing a MRU List (Most Recently Used) which caches the last accessed elements.

Critics

  • Map look up and retrieval take O(log N), but the items may be scattered throughout the memory, thus not playing well with caching strategies.
  • Vector are more cache friendly, however unless you sort it you'll have O(N) performance on find, is it acceptable ?
  • Why not using a unordered_map ? They provide O(1) lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tables you'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.

Why is a binary search over a sorted vector is slower than std::set find?

For the updated question:

-O1 and -O2 gives the same performance for the two methods.

-Ox slows down the vector version.

Why this is, you need to look at the disassembly or at the details of the -Ox level. It has nothing to do with the algorithmic properties of set.find and lower_bound/binary_search.

Regarding the locality of data. A binary_search and a set::find has for reasonable implementations exactly the same locality of data. The set might even win with the data being read in a left-to-right fashion.

Searching std::map in O(n) for a partial key

In c++14 you can use the overload to search on a partial key:

struct CompareFirstTwo {
using is_transparent = void;
bool operator()(const tuple<A, B, C>& lhs, const tuple<A, B, C>& rhs) const ...
bool operator()(const tuple<A, B>& lhs, const tuple<A, B, C>& rhs) const ...
bool operator()(const tuple<A, B, C>& lhs, const tuple<A, B>& rhs) const ...
};

Use the comparator above in a call to equal_range to ignore the third field in the tuple.

How to find an element in a specified range in std::map?

You can use std::lower_bound, std::upper_bound or std::equal_range for that as std::map iterators and data in the map satisfy the requirement for those functions, though you should be aware that it will be less efficient than std::map::find() due to linear iterator increments.

From std::lower_bound documentation

The number of comparisons performed is logarithmic in the distance between first and last (At most log
2(last - first) + O(1) comparisons). However, for non-LegacyRandomAccessIterators, the number of iterator increments is linear.

emphasis is mine.

When to choose std::vector over std::map for key-value data?

I only rarely use std::vector with a linear search (except in conjunction with binary searching as described below). I suppose for a small enough amount of data it would be better, but with that little data it's unlikely that anything is going to provide a huge advantage.

Depending on usage pattern, a binary search on an std::vector can make sense though. A std::map works well when you need to update the data regularly during use. In quite a few cases, however, you load up some data and then you use the data -- but after you've loaded the data, it mostly remains static (i.e., it changes very little, if at all).

In this case, it can make a lot of sense to load the data into a vector, sort it if necessary, and then do binary searches on the data (e.g. std::lower_bound, std::equal_range). This gives pretty much the best of both worlds -- low-complexity binary searches and good cache usage from high locality of reference (i.e., the vector is contiguous, as opposed to the linked structure of a std::map). The shortcoming, of course, is that insertions and deletions are slow -- but this is one time I have used your original idea -- store newly inserted data separately until it reaches some limit, and only then sort it in with the rest of the data, so a single search consists of a binary search of the main body of the data, followed by a linear search of the (small amount) of newly inserted data.

Finding nearest point with lower-bound... but data is not sorted

copy your vector into an stl set and then use the same logic as you have done for vector for finding the element.



Related Topics



Leave a reply



Submit