Why Is Iterating Though 'Std::Vector' Faster Than Iterating Though 'Std::Array'

Why is iterating over a std::set so much slower than over a std::vector?

Isn't a set just a structured vector that checks whether each item is unique upon insertion?

No, by far not. These data structures are completely different, and the main distinction here is the memory layout: std::vector puts its element into a contiguous location in memory, while std::set is a node-based container, where every element is separately allocated and resides at distinct places in memory, possibly far away from each other and definitely in a way that pre-fetching data for fast traversal is impossible for the processor. This is quite the opposite for std::vector - as the next element is always just right "next to" the current one in memory, a CPU will load elements into its cache, and when actually processing the elements, it only has to go to the cache to retrieve the values - which is very fast compared to RAM access.

Note that it's a common need to have a sorted, unique collection of data that is laid out contiguously in memory, and C++2a or the version thereafter might actually ship with a flat_set, have a look at P1222.

Matt Austern's "Why you shouldn't use set (and what you should use instead)" is an interesting read, too.

Why is iterating though `std::vector` faster than iterating though `std::array`?

The difference is due to memory pages of array not being resident in process address space (global scope array is stored in .bss section of the executable that hasn't been paged in, it is zero-initialized). Whereas vector has just been allocated and zero-filled, so its memory pages are already present.

If you add

std::fill_n(v.data(), n, 1); // included in <algorithm>

as the first line of main to bring the pages in (pre-fault), that makes array time the same as that of vector.


On Linux, instead of that, you can do mlock(v.data(), v.size() * sizeof(v[0])); to bring the pages into the address space. See man mlock for full details.

What's faster, iterating an STL vector with vector::iterator or with at()?

Why not write a test and find out?

Edit: My bad - I thought I was timing the optimised version but wasn't. On my machine, compiled with g++ -O2, the iterator version is slightly slower than the operator[] version, but probably not significantly so.

#include <vector>
#include <iostream>
#include <ctime>
using namespace std;

int main() {
const int BIG = 20000000;
vector <int> v;
for ( int i = 0; i < BIG; i++ ) {
v.push_back( i );
}

int now = time(0);
cout << "start" << endl;
int n = 0;
for(vector<int>::iterator it = v.begin(); it != v.end(); ++it) {
n += *it;
}

cout << time(0) - now << endl;
now = time(0);
for(size_t i = 0; i < v.size(); ++i) {
n += v[i];
}
cout << time(0) - now << endl;

return n != 0;
}

std::list vs std::vector iteration

The efficiency gains from cache coherency due to compact representation of data structures can be rather dramatic. In the case of vectors compared to lists, compact representation can be better not just for read but even for insertion (shifting in vectors) of elements up to the order of 500K elements for some particular architecture as demonstrated in Figure 3 of this article by Bjarne Stroustrup:

http://www2.research.att.com/~bs/Computer-Jan12.pdf

(Publisher site: http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2011.353)

I think that if this is a critical factor for your program, you should profile it on your architecture.

Is std::vector so much slower than plain arrays?

Using the following:

g++ -O3 Time.cpp -I <MyBoost>

./a.out

UseArray completed in 2.196 seconds

UseVector completed in 4.412 seconds

UseVectorPushBack completed in 8.017 seconds

The whole thing completed in 14.626 seconds

So array is twice as quick as vector.

But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize() the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.

Re-Arranging the code slightly so that the vector only initializes each object once:

 std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));

Now doing the same timing again:

g++ -O3 Time.cpp -I <MyBoost>

./a.out

UseVector completed in 2.216 seconds

The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.

I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray() method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.

Why is std::vector slower than an array?

Let us observe how GCC optimizes this test program:

#include <vector>

int main()
{
int len = 800000;
int* Data = new int[len];

int arr[3] = { 255, 0, 0 };
std::vector<int> vec = { 255, 0, 0 };

for (int i = 0; i < len; i++) {
Data[i] = vec[0];
}
for (int i = 0; i < len; i++) {
Data[i] = arr[0];
}
delete[] Data;
}

The compiler rightly notices that the vector is constant, and eliminates it. Exactly same code is generated for both loops. Therefore it should be irrelevant whether the first loop uses array or vector.

.L2:
movups XMMWORD PTR [rcx], xmm0
add rcx, 16
cmp rsi, rcx
jne .L2

What makes difference in your test program is the order of loops. The comments point out that when a third loop is added to the beginning, both loops take the same time.

I would expect that with a modern compiler accessing a vector would be approximately as fast as accessing an array, when optimization is enabled and debug is disabled. If there is an observable difference in your actual program, the problem lies somewhere else.

Speed accessing a std::vector by iterator vs by operator[]/index?

The performance difference is likely negligable or none (the compiler might optimise them to be identical); you should worry about other things, like whether your program is correct (a slow but correct program is better than a fast and incorrect program). There are other advantages to using iterators though, such as being able to change the underlying container to one with no operator[] without modifying your loops. See this question for more.

const_iterators will most likely have none, or negligable, performance difference compared to ordinary iterators. They are designed to improve the correctness of your program by preventing modifying things that shouldn't be modified, not for performance. The same goes for the const keyword in general.

In short, optimisation should not be a concern of yours until two things have happened: 1) you have noticed it runs too slowly and 2) you have profiled the bottlenecks. For 1), if it ran ten times slower than it could, but is only ever run once and takes 0.1ms, who cares? For 2), make sure it's definitely the bottleneck, otherwise optimising it will have nearly no measurable effect on performance!

Why is it so slow iterating over a big std::list?

Lists have terrible (nonexistent) cache locality. Every node is a new memory allocation, and may be anywhere. So every time you follow a pointer from one node to the next, you jump to a new, unrelated, place in memory. And yes, that hurts performance quite a bit. A cache miss may be two orders of magnitudes slower than a cache hit. In a vector or deque, pretty much every access will be a cache hit. A vector is one single contiguous block of memory, so iterating over that is as fast as you're going to get. A deque is several smaller blocks of memory, so it introduces the occasional cache miss, but they'll still be rare, and iteration will still be very fast as you're getting mostly cache hits.

A list will be almost all cache misses. And performance will suck.

In practice, a linked list is hardly ever the right choice from a performance point of view.

Edit:
As a comment pointed out, another problem with lists is data dependencies. A modern CPU likes to overlap operations. But it can't do that if the next instruction depends on the result of this one.

If you're iterating over a vector, that's no problem. You can compute the next address to read on the fly, without ever having to check in memory. If you're reading at address x now, then the next element will be located at address x + sizeof(T) where T is the element type. So there are no dependencies there, and the CPU can start loading the next element, or the one after it, immediately, while still processing an earlier element. That way, the data will be ready for us when we need it, and this further helps mask the cost of accessing data in RAM.

In a list, we need to follow a pointer from node i to node i+1, and until i+1 has been loaded, we don't even know where to look for i+2. We have a data dependency, so the CPU is forced to read nodes one at a time, and it can't start reading future nodes ahead of time, because it doesn't yet know where they are.

If a list hadn't been all cache misses, this wouldn't have been a big problem, but since we're getting a lot of cache misses, these delays are costly.



Related Topics



Leave a reply



Submit