Std::Vector Reserve() and Push_Back() Is Faster Than Resize() and Array Index, Why

std::vector reserve() and push_back() is faster than resize() and array index, why?

Does resize initialize the newly allocated vector where reserve just allocates but does not construct?

Yes.

Why is resizing a vector faster than reserve and push_back?

push_back is costlier operation than index based access even if allocation has been taken care of before hand by reserve.

  1. push_back will need to take care of end pointer so that vector size can be computed correctly
  2. push_back will check for realloaction need. Essentially a branch prediction.
  3. push_back will cause copy (or move) of value to be pushed back. In case of int, it shouldn't cause performance difference.

If you see assembly conversion (Taken from godbolt link given in question), index operation is non branching sequence of few moves and shift operation while push_back is much more involved. In long running loop (1000000 in given example), this difference will matter. Compiler optimization level can definitely impact the difference.

For index operator []

    push    rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax]
mov rdx, QWORD PTR [rbp-16]
sal rdx, 2
add rax, rdx
pop rbp
ret

For push_back

    push    rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-8]
mov rdx, QWORD PTR [rax+8]
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax+16]
cmp rdx, rax
je .L73
mov rax, QWORD PTR [rbp-8] // When allocation is not needed
mov rcx, QWORD PTR [rax+8]
mov rax, QWORD PTR [rbp-8]
mov rdx, QWORD PTR [rbp-16]
mov rsi, rcx
mov rdi, rax
call void std::allocator_traits<std::allocator<int> >::construct<int, int const&>(std::allocator<int>&, int*, int const&)
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax+8]
lea rdx, [rax+4]
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rax+8], rdx
jmp .L75
.L73: // When allocation is needed
mov rax, QWORD PTR [rbp-8]
mov rdi, rax
call std::vector<int, std::allocator<int> >::end()
mov rcx, rax
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rcx
mov rdi, rax
.L75:
nop
leave
ret

What is better: reserve vector capacity, preallocate to size or push back in loop?

Better performance will be reached when avoiding dynamic reallocation, so try to have the vector memory be big enough to receive all elements.

Your first solution will be more efficient because if nSize is bigger than default vector capacity, the second one will need a reallocation to be able to store all elements.

As commented by Melkon, reserve is even better:

void myfnc_1(void *a_src, uint32_t a_segment) {
size_t nSize = GetSize();
std::vector<std::string> values;
values.reserve( nSize );
char* v = a_src;

for (size_t i = 0; i < nSize; ++i) {
values.push_back( std::string( v, a_segment ) );
v += a_segment;
}
}

Does declaring a vector with size offer any improvements over using push_back in C++

With

vector<int> Array(n);

you create a vector that contains n elements, all memory needed for those elements is allocated immediately.

When you use e.g.

Array.push_back(value);

then the vector needs to be resized, which could mean the memory have to be reallocated and all the contents have to be copied to the new memory.


Instead of creating an array with a set size, you could instead preallocate (or reserve) memory:

vector<int> Array;  // An empty vector
Array.reserve(n); // Reserve memory, but keep the size as zero (it's still empty)
Array.push_back(value); // No reallocation needed, size is now one

This is useful when you have a vector of objects that can't be default-constructed.

Important concepts to learn: The vector size and its capacity and what the difference between them is.

The capacity is the number of elements the vector have allocated memory for.

The size is the current number of elements in the vector.

It's quite common for the capacity to be different from the size. And it must always be true that capacity >= size.

why std::vector::push_back much slower than a manual implementation?

The difference is mainly due to less favorable codegen in the std::vector version. We can see it if we compare the generated assembly (godbolt link).

Your loop (skipping the reallocation part):

$LL4@test:
xor esi, esi
xor edi, edi
$LL7@test:
mov r15, rbx
cmp rdi, rbx
jne SHORT $LN27@test
<...skip...>
$LN27@test:
mov DWORD PTR [r14+rdi*4], esi
inc rdi
inc esi
cmp esi, 100000 ; 000186a0H
jl $LL7@test
sub r12, r13
jne $LL4@test

std::vector::push_back loop (again, skipping the reallocation part):

$LL4@test:
xor ebx, ebx
mov DWORD PTR i$1[rsp], ebx
$LL7@test:
cmp rcx, QWORD PTR v$[rsp+16]
je SHORT $LN26@test
mov DWORD PTR [rcx], ebx
mov rcx, QWORD PTR v$[rsp+8]
add rcx, 4
mov QWORD PTR v$[rsp+8], rcx
jmp SHORT $LN5@test
$LN26@test:
<...skip...>
$LN5@test:
inc ebx
mov DWORD PTR i$1[rsp], ebx
cmp ebx, 100000 ; 000186a0H
jl SHORT $LL7@test
mov rcx, QWORD PTR v$[rsp]
mov QWORD PTR v$[rsp+8], rcx
sub rdi, 1
jne SHORT $LL4@test

Clearly we can see more code (11 vs. 8 instructions in the hot path) and more indirection to memory (5 vs. 1). So it's no surprise that it's slower.

More generally, more complex code == slower code.

Could the two versions be optimized identically? I see no reason why not. MSVC 19.28 just can't do it.

Why push_back is slower than operator[] for a previously allocated vector

push_back does a bounds check. operator[] does not. So even if you've reserved the space, push_back is going to have an extra conditional check that operator[] will not have. Additionally, it will increase the size value (reserve only sets the capacity), so it will update that every time.

In short, push_back is doing more than what operator[] is doing - which is why it is slower (and more accurate).

Choice between vector::resize() and vector::reserve()

The two functions do vastly different things!

The resize() method (and passing argument to constructor is equivalent to that) will insert or delete appropriate number of elements to the vector to make it given size (it has optional second argument to specify their value). It will affect the size(), iteration will go over all those elements, push_back will insert after them and you can directly access them using the operator[].

The reserve() method only allocates memory, but leaves it uninitialized. It only affects capacity(), but size() will be unchanged. There is no value for the objects, because nothing is added to the vector. If you then insert the elements, no reallocation will happen, because it was done in advance, but that's the only effect.

So it depends on what you want. If you want an array of 1000 default items, use resize(). If you want an array to which you expect to insert 1000 items and want to avoid a couple of allocations, use reserve().

EDIT: Blastfurnace's comment made me read the question again and realize, that in your case the correct answer is don't preallocate manually. Just keep inserting the elements at the end as you need. The vector will automatically reallocate as needed and will do it more efficiently than the manual way mentioned. The only case where reserve() makes sense is when you have reasonably precise estimate of the total size you'll need easily available in advance.

EDIT2: Ad question edit: If you have initial estimate, then reserve() that estimate. If it turns out to be not enough, just let the vector do it's thing.

resize versus push_back in std::vector : does it avoid an unnecessary copy assignment?

At least with GCC, it doesn't matter which you use (Results below). However, if you get to the point where you are having to worry about it, you should be using pointers or (even better) some form of smart pointers.. I would of course recommend the ones in the boost library.

If you wanted to know which was better to use in practice, I would suggest either push_back or reserve as resize will resize the vector every time it is called unless it is the same size as the requested size. push_back and reserve will only resize the vector if needed. This is a good thing as if you want to resize the vector to size+1, it may already be at size+20, so calling resize would not provide any benefit.

Test Code

#include <iostream>
#include <vector>

class Elem{
public:
Elem(){
std::cout << "Construct\n";
}
Elem(const Elem& e){
std::cout << "Copy\n";
}
~Elem(){
std::cout << "Destruct\n";
}
};

int main(int argc, char* argv[]){
{
std::cout << "1\n";
std::vector<Elem> v;
v.push_back(Elem());
}

{
std::cout << "\n2\n";
std::vector<Elem> v;
v.resize(v.size()+1);
}
}

Test Output

1
Construct
Copy
Destruct
Destruct

2
Construct
Copy
Destruct
Destruct

std::vector::resize() vs. std::vector::reserve()

There are two different methods for a reason:

std::vector::reserve will allocate the memory but will not resize your vector, which will have a logical size the same as it was before.

std::vector::resize will actually modify the size of your vector and will fill any space with objects in their default state. If they are ints, they will all be zero.

After reserve, in your case, you will need a lot of push_backs to write to element 5.
If you don't wish to do that then in your case you should use resize.

One thing about reserve: if you then add elements with push_back, until you reach the capacity you have reserved, any existing references, iterators or pointers to data in your vector will remain valid. So if I reserve 1000 and my size is 5, the &vec[4] will remain the same until the vector has 1000 elements. After that, I can call push_back() and it will work, but the stored pointer of &vec[4] earlier may no longer be valid.

push_back is more efficient than emplace_back?

push_back is not more efficient, and the results you observe are due to the vector resizing itself.

When you call emplace after push_back, the vector has to resize itself to make room for the second element. This means that it has to move the A that was originally inside the vector, making emplace appear more complex.

If you reserve enough space in the vector beforehand, this doesn't happen. Notice the call to va.reserve(2) after va's creation:

#include <iostream>     
#include <vector>

class A
{
public:
A() {std::cout << "A const" << std::endl;}
~A() {std::cout << "A dest" << std::endl;}
A(const A& a) {std::cout << "A copy const" << std::endl;}
A(A&& a) {std::cout << "A move const" << std::endl;}
A& operator=(const A& a) {std::cout << "A copy operator=" << std::endl; return *this; }
A& operator=(A&& a) {std::cout << "A move operator=" << std::endl; return *this; }
};

int main () {
std::vector<A> va;
// Now there's enough room for two elements
va.reserve(2);
std::cout <<"push:" << std::endl;
va.push_back(A());
std::cout <<std::endl<< "emplace:" << std::endl;
va.emplace_back(A());

std::cout <<std::endl<< "end:" << std::endl;

return 0;
}

The corresponding output is:

push:
A const
A move const
A dest

emplace:
A const
A move const
A dest

end:
A dest
A dest

Can we make things even more efficient? Yes! emplace_back takes whatever arguments you provide it, and forwards them to A's constructor. Because A has a constructor that takes no arguments, you can also use emplace_back with no arguments! In other words, we change

va.emplace_back(A());

to

va.emplace_back(); // No arguments necessary since A is default-constructed

This results in no copy, and no move:

push:
A const
A move const
A dest

emplace:
A const

end:
A dest
A dest

A note on vectors resizing: It's important to note that the implementation of std::vector is smart. If A had been a trivially copyable type, std::vector might have been able resize in-place without additional copying using a system function similar to realloc. However because As constructors and destruction contain code, realloc can't be used here.



Related Topics



Leave a reply



Submit