Std::Vector Versus Std::Array in C++

std::vector versus std::array in C++

std::vector is a template class that encapsulate a dynamic array1, stored in the heap, that grows and shrinks automatically if elements are added or removed. It provides all the hooks (begin(), end(), iterators, etc) that make it work fine with the rest of the STL. It also has several useful methods that let you perform operations that on a normal array would be cumbersome, like e.g. inserting elements in the middle of a vector (it handles all the work of moving the following elements behind the scenes).

Since it stores the elements in memory allocated on the heap, it has some overhead in respect to static arrays.

std::array is a template class that encapsulate a statically-sized array, stored inside the object itself, which means that, if you instantiate the class on the stack, the array itself will be on the stack. Its size has to be known at compile time (it's passed as a template parameter), and it cannot grow or shrink.

It's more limited than std::vector, but it's often more efficient, especially for small sizes, because in practice it's mostly a lightweight wrapper around a C-style array. However, it's more secure, since the implicit conversion to pointer is disabled, and it provides much of the STL-related functionality of std::vector and of the other containers, so you can use it easily with STL algorithms & co. Anyhow, for the very limitation of fixed size it's much less flexible than std::vector.

For an introduction to std::array, have a look at this article; for a quick introduction to std::vector and to the the operations that are possible on it, you may want to look at its documentation.




  1. Actually, I think that in the standard they are described in terms of maximum complexity of the different operations (e.g. random access in constant time, iteration over all the elements in linear time, add and removal of elements at the end in constant amortized time, etc), but AFAIK there's no other method of fulfilling such requirements other than using a dynamic array. As stated by @Lucretiel, the standard actually requires that the elements are stored contiguously, so it is a dynamic array, stored where the associated allocator puts it.

Using arrays or std::vectors in C++, what's the performance gap?

Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem you have to keep track of the size, and you need to delete them manually and do all sort of housekeeping.

Using arrays on the stack is also discouraged because you don't have range checking, and passing the array around will lose any information about its size (array to pointer conversion). You should use boost::array in that case, which wraps a C++ array in a small class and provides a size function and iterators to iterate over it.

Now the std::vector vs. native C++ arrays (taken from the internet):

// Comparison of assembly code generated for basic indexing, dereferencing, 
// and increment operations on vectors and arrays/pointers.

// Assembly code was generated by gcc 4.1.0 invoked with g++ -O3 -S on a
// x86_64-suse-linux machine.

#include <vector>

struct S
{
int padding;

std::vector<int> v;
int * p;
std::vector<int>::iterator i;
};

int pointer_index (S & s) { return s.p[3]; }
// movq 32(%rdi), %rax
// movl 12(%rax), %eax
// ret

int vector_index (S & s) { return s.v[3]; }
// movq 8(%rdi), %rax
// movl 12(%rax), %eax
// ret

// Conclusion: Indexing a vector is the same damn thing as indexing a pointer.

int pointer_deref (S & s) { return *s.p; }
// movq 32(%rdi), %rax
// movl (%rax), %eax
// ret

int iterator_deref (S & s) { return *s.i; }
// movq 40(%rdi), %rax
// movl (%rax), %eax
// ret

// Conclusion: Dereferencing a vector iterator is the same damn thing
// as dereferencing a pointer.

void pointer_increment (S & s) { ++s.p; }
// addq $4, 32(%rdi)
// ret

void iterator_increment (S & s) { ++s.i; }
// addq $4, 40(%rdi)
// ret

// Conclusion: Incrementing a vector iterator is the same damn thing as
// incrementing a pointer.

Note: If you allocate arrays with new and allocate non-class objects (like plain int) or classes without a user defined constructor and you don't want to have your elements initialized initially, using new-allocated arrays can have performance advantages because std::vector initializes all elements to default values (0 for int, for example) on construction (credits to @bernie for reminding me).

std::vector vs std::array vs normal array

push_back takes a long time compared to arr[i]=x;.

Sorry but you are showing your lack of experience with vectors here, because your examples do two different things.

You are comparing something like this code

vector<int> vec;       // vector created with size zero
for (...)
vec.push_back(x); // vector size increases

with this code

int arr[N];
for (...)
arr[i] = x;

The difference is that in the first case the vector has size 0 and it's size increases as you add items to it (this takes extra time), but in the second case the array starts out at it's final size. With an array this is how it must be, but with vectors you have a choice. If you know what the final size of the vector is you should code it like this

vector<int> vec(N); // vector created at size N, note use () not []
for (...)
vec[i] = x;

That is the code you should be comparing with the array code for efficiency,

You might also want to research the resize and reserve methods of a vector. Vectors (if nothing else) are much more flexible than arrays.

C++ performance std::array vs std::vector

C++ aliasing rules don't let the compiler prove that glob[i] += stuff doesn't modify one of the elements of const vec v1 {1.0,-1.0,1.0}; or v2.

const on a std::vector means the "control block" pointers can be assumed to not be modified after it's constructed, but the memory is still dynamically allocated an all the compiler knows is that it effectively has a const double * in static storage.

Nothing in the std::vector implementation lets the compiler rule out some other non-const pointer pointing into that storage. For example, the double *data in the control block of glob.

C++ doesn't provide a way for library implementers to give the compiler the information that the storage for different std::vectors doesn't overlap. They can't use __restrict (even on compilers that support that extension) because that could break programs that take the address of a vector element. See the C99 documentation for restrict.


But with const arr a1 {1.0,-1.0,1.0}; and a2, the doubles themselves can go in read-only static storage, and the compiler knows this. Therefore it can evaluate comb(a1[0],a2[0]); and so on at compile time. In @Xirema's answer, you can see the asm output loads constants .LC1 and .LC2. (Only two constants because both a1[0]+a2[0] and a1[2]+a2[2] are 1.0+1.0. The loop body uses xmm2 as a source operand for addsd twice, and the other constant once.)


But couldn't the compiler still do the sums once outside the loop at runtime?

No, again because of potential aliasing. It doesn't know that stores into glob[i+0..3] won't modify the contents of v1[0..2], so it reloads from v1 and v2 every time through the loop after the store into glob.

(It doesn't have to reload the vector<> control block pointers, though, because type-based strict aliasing rules let it assume that storing a double doesn't modify a double*.)

The compiler could have checked that glob.data() + 0 .. N-3 didn't overlap with either of v1/v1.data() + 0 .. 2, and made a different version of the loop for that case, hoisting the three comb() results out of the loop.

This is a useful optimization that some compilers do when auto-vectorizing if they can't prove lack of aliasing; it's clearly a missed optimization in your case that gcc doesn't check for overlap because it would make the function run much faster. But the question is whether the compiler could reasonably guess that it was worth emitting asm that checks at runtime for overlap, and has 2 different versions of the same loop. With profile-guided optimization, it would know the loop is hot (runs many iterations), and would be worth spending extra time on. But without that, the compiler might not want to risk bloating the code too much.

ICC19 (Intel's compiler) in fact does do something like that here, but it's weird: if you look at the beginning of assemble_vec (on the Godbolt compiler explorer), it load the data pointer from glob, then adds 8 and subtracts the pointer again, producing a constant 8. Then it branches at runtime on 8 > 784 (not taken) and then -8 < 784 (taken). It looks like this was supposed to be an overlap check, but it maybe used the same pointer twice instead of v1 and v2? (784 = 8*100 - 16 = sizeof(double)*N - 16)

Anyway, it ends up running the ..B2.19 loop that hoists all 3 comb() calculations, and interestingly does 2 iterations at once of the loop with 4 scalar loads and stores to glob[i+0..4], and 6 addsd (scalar double) add instructions.

Elsewhere in the function body, there's a vectorized version that uses 3x addpd (packed double), just storing / reloading 128-bit vectors that partially overlap. This will cause store-forwarding stalls, but out-of-order execution may be able to hide that. It's just really weird that it branches at runtime on a calculation that will produce the same result every time, and never uses that loop. Smells like a bug.


If glob[] had been a static array, you'd still have had a problem. Because the compiler can't know that v1/v2.data() aren't pointing into that static array.

I thought if you accessed it through double *__restrict g = &glob[0];, there wouldn't have been a problem at all. That will promise the compiler that g[i] += ... won't affect any values that you access through other pointers, like v1[0].

In practice, that does not enable hoisting of comb() for gcc, clang, or ICC -O3. But it does for MSVC. (I've read that MSVC doesn't do type-based strict aliasing optimizations, but it's not reloading glob.data() inside the loop so it has somehow figured out that storing a double won't modify a pointer. But MSVC does define the behaviour of *(int*)my_float for type-punning, unlike other C++ implementations.)

For testing, I put this on Godbolt

//__attribute__((noinline))
void assemble_vec()
{
double *__restrict g = &glob[0]; // Helps MSVC, but not gcc/clang/ICC
// std::vector<double> &g = glob; // actually hurts ICC it seems?
// #define g glob // so use this as the alternative to __restrict
for (size_t i=0; i<N-2; ++i)
{
g[i] += comb(v1[0],v2[0]);
g[i+1] += comb(v1[1],v2[1]);
g[i+2] += comb(v1[2],v2[2]);
}
}

We get this from MSVC outside the loop

    movsd   xmm2, QWORD PTR [rcx]       # v2[0]
movsd xmm3, QWORD PTR [rcx+8]
movsd xmm4, QWORD PTR [rcx+16]
addsd xmm2, QWORD PTR [rax] # += v1[0]
addsd xmm3, QWORD PTR [rax+8]
addsd xmm4, QWORD PTR [rax+16]
mov eax, 98 ; 00000062H

Then we get an efficient-looking loop.

So this is a missed-optimization for gcc/clang/ICC.

What is the difference between std::array and std::vector? When do you use one over other?

std::array is just a class version of the classic C array. That means its size is fixed at compile time and it will be allocated as a single chunk (e.g. taking space on the stack). The advantage it has is slightly better performance because there is no indirection between the object and the arrayed data.

std::vector is a small class containing pointers into the heap. (So when you allocate a std::vector, it always calls new.) They are slightly slower to access because those pointers have to be chased to get to the arrayed data... But in exchange for that, they can be resized and they only take a trivial amount of stack space no matter how large they are.

[edit]

As for when to use one over the other, honestly std::vector is almost always what you want. Creating large objects on the stack is generally frowned upon, and the extra level of indirection is usually irrelevant. (For example, if you iterate through all of the elements, the extra memory access only happens once at the start of the loop.)

The vector's elements are guaranteed to be contiguous, so you can pass &vec[0] to any function expecting a pointer to an array; e.g., C library routines. (As an aside, std::vector<char> buf(8192); is a great way to allocate a local buffer for calls to read/write or similar without directly invoking new.)

That said, the lack of that extra level of indirection, plus the compile-time constant size, can make std::array significantly faster for a very small array that gets created/destroyed/accessed a lot.

So my advice would be: Use std::vector unless (a) your profiler tells you that you have a problem and (b) the array is tiny.

c++11 std::array vs static array vs std::vector

First things first, if you are going to learn C++, learn C++11. The previous C++ standard was released in 2003, meaning it's already ten years old. That's a lot in IT world. C++11 skills will also smoothly translate to upcoming C++1y (most probably C++14) standard.

The main difference between std::vector and std::array is the dynamic (in size and allocation) and static storage. So if you want to have a matrix class that's always, say, 4x4, std::array<float, 4*4> will do just fine.

Both of these classes provide .data() member, which should produce a compatible pointer. Note however, that std::vector<std::vector<float>> will NOT occuppy contiguous 16*sizeof(float) memory (so v[0].data() won't work). If you need a dynamically sized matrix, use single vector and resize it to the width*height size.

Since the access to the elements will be a bit harder (v[width * y +x] or v[height * x + y]), you might want to provide a wrapper class that will allow you to access arbitrary field by row/column pair.

Since you've also mentioned C-style arrays; std::array provides nicer interface to deal with the same type of storage, and thus should be preferred; there's nothing to gain with static arrays over std::array.

Should I use std::vector instead of array

One interesting thing to note is that while iterators will be invalidated in many functions with vectors, that is not the case with arrays. Note: std::swap with std::array the iterator will still point to the same spot.

See more:
http://en.cppreference.com/w/cpp/container/array

Good summary of advantages of arrays:
https://stackoverflow.com/a/4004027/7537900

This point seemed most interesting:

fixed-size arrays can be embedded directly into a struct or object,
which can improve memory locality and reducing the number of heap
allocations needed

Not having tested that, I'm not sure it's actually true though.

Here is a discussion in regards to 2D Vectors vs Arrays in regards to the competitive programming in Code Chef:
https://discuss.codechef.com/questions/49278/whether-to-use-arrays-or-vectors-in-c

Apparently memory is not contiguous in 2 dimensions in 2D vectors, only one dimension, however in 2D arrays it is.

std::array vs C-array vs std:vector

From the GCC documentation about Optimization:

Without any optimization option, the compiler's goal is to reduce the
cost of compilation and to make debugging produce the expected
results. Statements are independent: if you stop the program with a
breakpoint between statements, you can then assign a new value to any
variable or change the program counter to any other statement in the
function and get exactly the results you expect from the source code.

Without the -O option, GCC does not care about performance. It cares about debugging using gdb or other similar tool. To actually compile programs that runs as fast as possible you should use the -O options. For example, -O2.

std::vector vs std::array performance

I changed your code to this:

std::array<double, 1000000> array;

double total = 0;
std::fill(array.begin(), array.end(), 0.);

for (unsigned j = 0; j < 1000; ++j)
{
auto start = std::chrono::high_resolution_clock::now();

for (unsigned i = 0; i < array.size(); i++)
{
array[i] += 1.;
}

auto end = std::chrono::high_resolution_clock::now();
total = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
}

std::cout << total << " for Array." << std::endl;

std::vector<double> vector(1000000, 0.);
total = 0;

for (unsigned j = 0; j < 1000; ++j)
{
auto start = std::chrono::high_resolution_clock::now();

for (unsigned i = 0; i < vector.size(); i++)
{
vector[i] += 1.;
}

auto end = std::chrono::high_resolution_clock::now();
total = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
}

std::cout << total << " for Vector." << std::endl;

My results using -O3:

8123 for Array.
8117 for Vector.

Seems to me that both are equally fast.



Related Topics



Leave a reply



Submit