Vector or Map, Which One to Use

vector or map, which one to use?

I presume you're comparing map<A, B> with vector<pair<A, B> >.

Firstly, finding an item in a very small vector can easily be faster than the same thing in a map, because all the memory in a vector is always contiguous (and so plays more nicely with computers' caches and such things), and the number of comparisons needed to find something in a vector might be about the same as for a map. Finding an element in a map needs fewer operations in the limit of very large containers.

The point where maps become faster than vectors depends on the implementation, on your processor, what data is in the map, and subtle things like what memory is in the processor's cache. Typically, the point where map becomes faster would be about 5-30 elements.

An alternative is to use a hash container. They are often named hash_map or unordered_map. Classes named hash_map are not part of the official standard (and there are a few variants out there); std::tr1::unordered_map is. A hash map is often faster than a normal map for lookups, regardless of how many elements are in it, but whether it is actually faster depends on what the key is, how it is hashed, what values you have to deal with, and how the key is compared in std::map. It doesn't keep things in a specific order like std::map, but you've said that you don't care about that. I'd recommend hash maps particularly if the keys are integers or pointers, because these hash very quickly.

vector vs map performance confusion

I found the slides for easier reference (I can't see graphs, but I guess that might be because of proprietary file format). A relevant slide is number 39 which describes the problem that is being solved:

§ Generate N random integers and insert them into a sequence so that each is inserted in its proper position in the numerical order.

§ Remove elements one at a time by picking a random position in the sequence and removing the element there.

Now, it should be rather obvious that a linked list is not a good choice for this problem. Even though a list is much better than vector for inserting/removing in the beginning or in the middle, it's not good for inserting/removing in a random position because of the need for linear search. And linear search is much faster with vectors because of better cache efficiency.

Sutter suggests that a map (or a tree in general) would seem a natural choice for this algorithm because you get O(log n) search. And indeed, it does beat the vector quite easily for large N values in the insertion part.

Here comes the but. You need to remove the nth element (for random n). This is where I believe your code is cheating. You remove the elements in the order they were inserted, effectively using the input vector as a lookup table for finding value of an element at a "random" position so that you can search for it in O(log n). So you're really using a combination of set and a vector to solve the problem.

A regular binary search tree such as one used for std::map or std::set (which I assume Sutter used) doesn't have a fast algorithm for finding the nth element. Here's one which is claimed to be O(log n) on average and O(n) in the worst case. But std::map and std::set don't provide access to the underlying tree structure so for those you're stuck with in-order traversal (correct me if I'm wrong) which is a linear search again! I'm actually surprised that the map version is competitive with the vector one in Sutter's results.

For log(n) complexity, you need a structure such as Order statistic tree which is unfortunately not provided by standard library. There's GNU Policy-Based STL MAP as shown here.

Here is a quick test code I made for vector vs set vs ost (vs vector with binary search for good measure) https://ideone.com/DoqL4H
Set is much slower while the other tree based structure is faster than the vector, which is not in line with Sutter's results.

order statistics tree: 15.958ms
vector binary search: 99.661ms
vector linear search: 282.032ms
set: 2004.04ms

(N = 20000, difference is only going to be greater in favor for the ost with bigger values)

In short, I came to same conclusion that Sutter's original results seem odd but for a slightly different reason. It seems to me that better asymptotic complexity wins lower constant factors this time.

Note that the problem description doesn't exclude the possibility of duplicate random values so using map / set instead of multimap / multiset is cheating a bit in the favor of the map / set, but I assume that to have only small significance when value domain is much larger than N. Also, pre-reserving the vector doesn't improve the performance significantly (around 1% when N = 20000).

advantages of std::set vs vectors or maps

Both std::set and std::map are associative containers. The difference is that std::sets contain only the key, while in std::map there is an associated value. Choosing one over the other depends mainly on what the task at hand is. If you want to build a dictionary of all the words that appear in a text, you could use a std::set<std::string>, but if you also want to count how many times each word appeared (i.e. associate a value to the key) then you would need an std::map<std::string,int>. If you don't need to associate that count, it does not make sense to have the int that is unnecessary.

When to choose std::vector over std::map for key-value data?

I only rarely use std::vector with a linear search (except in conjunction with binary searching as described below). I suppose for a small enough amount of data it would be better, but with that little data it's unlikely that anything is going to provide a huge advantage.

Depending on usage pattern, a binary search on an std::vector can make sense though. A std::map works well when you need to update the data regularly during use. In quite a few cases, however, you load up some data and then you use the data -- but after you've loaded the data, it mostly remains static (i.e., it changes very little, if at all).

In this case, it can make a lot of sense to load the data into a vector, sort it if necessary, and then do binary searches on the data (e.g. std::lower_bound, std::equal_range). This gives pretty much the best of both worlds -- low-complexity binary searches and good cache usage from high locality of reference (i.e., the vector is contiguous, as opposed to the linked structure of a std::map). The shortcoming, of course, is that insertions and deletions are slow -- but this is one time I have used your original idea -- store newly inserted data separately until it reaches some limit, and only then sort it in with the rest of the data, so a single search consists of a binary search of the main body of the data, followed by a linear search of the (small amount) of newly inserted data.

C++ STL Map vs Vector speed

You effectively have a number of alternatives.

Libraries exist:

  • Loki::AssocVector: the interface of a map implemented over a vector of pairs, faster than a map for small or frozen sets because of cache locality.
  • Boost.MultiIndex: provides both List with fast lookup and an example of implementing a MRU List (Most Recently Used) which caches the last accessed elements.

Critics

  • Map look up and retrieval take O(log N), but the items may be scattered throughout the memory, thus not playing well with caching strategies.
  • Vector are more cache friendly, however unless you sort it you'll have O(N) performance on find, is it acceptable ?
  • Why not using a unordered_map ? They provide O(1) lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tables you'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.

What's the difference between 2d vector and map of vector?

They are fundamentally different. While you may be able to do both g2[0] and g1[0], the behavior is vastly different. Assume there is nothing at index 0, then std::map will default construct a new value_type, in this case a vector, and return a reference, whereas std::vector has undefined behavior, but typically either segfaults or returns garbage.

They are also completely different in terms of memory layout. While std::map is backed by a red-black tree, std::vector is contiguous in memory. So inserting into the map will always result in dynamic allocation somewhere in memory, whereas the vector would be resized in case its current capacity is exceeded. Note however, that a vector of vectors is not contiguous in memory. The first vector, which itself is contiguous in memory is made up of vectors which look roughly like this in terms of data:

struct vector 
{
T* data;
size_t capacity;
size_t size;
};

Where each of the vectors owns its dynamic memory allocation at data.

The advantage of the map is that is does not have to be densely populated, i.e. you can have something at index 0 and 12902 without all the stuff in between, plus it is sorted. If you don't need the sorted property and can use c++11 consider std::unordered_map. The vector is always densely populated, i.e. at size 10000, elements 0-9999 exist.

std::map int, int vs. vector of vector

I need a container to store a value (int) according to two attributes,
source (int) and destination (int)

std::map<std::pair<int, int>, int>

A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.

std::map::find

http://www.cplusplus.com/reference/map/map/find/

The container has the possibility of being very sparse.

Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.



Related Topics



Leave a reply



Submit