Benefits of Using Reserve() in a Vector - C++

Benefits of using reserve() in a vector - C++

It's useful if you have an idea how many elements the vector will ultimately hold - it can help the vector avoid repeatedly allocating memory (and having to move the data to the new memory).

In general it's probably a potential optimization that you shouldn't need to worry about, but it's not harmful either (at worst you end up wasting memory if you over estimate).

One area where it can be more than an optimization is when you want to ensure that existing iterators do not get invalidated by adding new elements.

For example, a push_back() call may invalidate existing iterators to the vector (if a reallocation occurs). However if you've reserved enough elements you can ensure that the reallocation will not occur. This is a technique that doesn't need to be used very often though.

Why is it useful to use std::vector::reserve?

You use reserve when you know that you will have at least n elements go into the vector. Resizing a container is a costly operation - you allocate new memory, copy the old content into it and then remove the old vector. If you know by default that you will get at least 10000 elements, it's better to reserve the size for the vector instead of letting the vector reallocate memory more times than necessary.

In simple words, it's all about the efficiency.

Choice between vector::resize() and vector::reserve()

The two functions do vastly different things!

The resize() method (and passing argument to constructor is equivalent to that) will insert or delete appropriate number of elements to the vector to make it given size (it has optional second argument to specify their value). It will affect the size(), iteration will go over all those elements, push_back will insert after them and you can directly access them using the operator[].

The reserve() method only allocates memory, but leaves it uninitialized. It only affects capacity(), but size() will be unchanged. There is no value for the objects, because nothing is added to the vector. If you then insert the elements, no reallocation will happen, because it was done in advance, but that's the only effect.

So it depends on what you want. If you want an array of 1000 default items, use resize(). If you want an array to which you expect to insert 1000 items and want to avoid a couple of allocations, use reserve().

EDIT: Blastfurnace's comment made me read the question again and realize, that in your case the correct answer is don't preallocate manually. Just keep inserting the elements at the end as you need. The vector will automatically reallocate as needed and will do it more efficiently than the manual way mentioned. The only case where reserve() makes sense is when you have reasonably precise estimate of the total size you'll need easily available in advance.

EDIT2: Ad question edit: If you have initial estimate, then reserve() that estimate. If it turns out to be not enough, just let the vector do it's thing.

Vector: initialization or reserve?

Both variants have different semantics, i.e. you are comparing apples and oranges.

The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.

Choose what better fits your needs, i.e. what is "better" in a certain situation.

In C++, is it beneficial if each thread allocates the memory that it will later (in a different parallel region) write to?

Allocation is done in parallel (and with appropriate version of malloc it might be faster)

Each thread is working with data allocated by itself which would have faster access (I have heard this from a colleague, but have no idea if that is correct).

Yes, but most allocator does not scale well in parallel. The less scalable allocators use a centralized allocation policy protected with a giant lock that will results in a slower execution due to the overhead of the lock (and an effect called cache-line bouncing). The best allocator try to perform local allocation when it is possible but they often need to update some atomic counters and atomic operations does not scale well. Jemalloc for example is known to scale quite well and use an arena-based allocation policy which benefit from thread-local allocations. The (relatively recent update of) the default glibc allocator should also benefit from thread-local allocations. AFAIK, the one of Windows does not scale well.

Creating a parallel region has an overhead that might affect the performance.

Yes. In fact, malloc is generally pretty fast since it generally make only few system call (see sbrk or memmap) to get blocks of memory that are split up and tracked. Allocations on x86-64 mainstream processors takes no more than few dozens of nanoseconds. The overhead of initializing the OpenMP fork-join section and distributing the work is generally about few micro seconds (this is very dependent of the runtime and the scheduling policy though). As a result you need to perform a lot of allocations with a scalable allocator so the parallelization to be useful. In the end, the benefit of doing that should be negligible compare to the computation time (and it can cause more harm than benefits). If your computation is malloc-bound, then this is really bad and you should redesign the code so to reduce allocations in the first place.

a) Are these pros and con correct or incorrect?

Overall, yes.

Are there other benefits/disadvantages in using parallel vs serial allocation here?

One important point to consider is false sharing: if two variables stored in the same cache line and is modified by two threads, then the program can be slowed down because of a cache-line bouncing effect due to cache coherence. As a result, a vector<vector<int>> can be an issue if the property of each sub-vector is modified in parallel by multiple threads (like the size for example) since this attributes may be stored on the same cache line than few others. Thread-local allocations can help to reduce this effect if the allocator permit it. This is also true for the integers in the vector but the effect should be negligible assuming the vector are big enough.

Do "number of threads" or "theSize" affect the pros/cons?

Yes. The overhead of the OpenMP section will be bigger with more threads and the allocator will scale less well with more threads. As a result, the benefit of using this strategy on many-core architecture is small.

Actually, vector<vector<int>> is known to be a pretty inefficient data structure due to the number of allocation done, and also due to the memory fragmentation/diffusion caused by the allocations of the sub-vectors. If you want to deal with matrices, then please do not do that and use a flatten 2D array. If you deal with jagged array of fixed size, then please consider using a flatten array which is the concatenation of all the jagged array with additional arrays to store the start-end slices. This is far more cache-friendly since the array are stored more contiguously in memory. It also drastically reduce the number of allocation. This structure is only fine if the jagged arrays are (quite frequently) dynamically resized.

Update in response to the edit:

The main question for me is this: If thread i is working (reading/writing) on data[i] and data[i] only, would it affect the performance if data[i] is allocated by thread "i" or if it is allocated by a different thread (e.g. main thread in a non-parallel region)?

Yes, the ith thread will certainly impact the performance of others (typically the (i-1)th or/and the (i+1)th). The main issue is the frequent change of the vector length which should cause false-sharing. This is dependent of the size of a vector on the target platform but it is usually 24 bytes (at least for GCC, Clang and ICC) so few of them can be stored on the same cache line (64-bytes on x86-64 platforms). In addition, doing many push_back causes the allocator to be quite saturated (dependent of the allocator, see the above explanation). Few large-vector resizes are generally enough to saturate the memory bandwidth (though it is dependent of the actual platform, the vector size, and the proportion of resizes compared to computation). In practice, one has to benchmark this to be sure.

You can avoid false-sharing with some padding. I also advise you to use a better-suited data-structure reducing the memory bandwidth usage like a std::deque (or even a simple vector of raw buffers). But this is dependent of the actual operation computed. Sometimes it is faster to compute a problem using multiple passes (eg. one to compute the size of the items and one to actually perform the operation). But again, this is dependent of the actual problem solved.

Should I worry about memory fragmentation with std::vector?

The answer to your worries may be std::deque. It gives you a similar interface to that of std::vector, but works better with fragmented memory, since it allocates several small arrays instead of a large one. It is actually less efficient than std::vector in some aspects, but for your case it may be a good trade-off.

What is the advantage of using vectorchar as input buffer over char array?

A vector<char> is essentially just a managed character array.

So you can write:

{
    vector<char> buf(4096);
    ...
    int result = recv(fd, &buf[received_so_far], buf.size() - received_so_far);
    ...
}

The vector "knows" its size, so you can use buf.size() everywhere and never have to worry about overrunning your buffer. You can also change the size in the declaration and have it take effect everywhere without any messy #defines.

This use of buf will allocate the underlying array on the heap, and it will free it automatically when buf goes out of scope, no matter how that happens (e.g. exceptions or early return). So you get the nice semantics of stack allocation while still keeping large objects on the heap.

You can use buf.swap() to "hand ownership" of the underlying character array to another vector<char> very efficiently. (This is a good idea for network traffic... Modern networks are fast. The last thing you want to do is to create yet another copy of every byte you receive from the network.) And you still do not have to worry about explicitly freeing the memory.

Those are the big advantages that come to mind off the top of my head for this particular application.

Is it possible to reserve function reference into unordered_map or vector?

Regardless of wether this is something you should be doing or not (the comments under your question are covering this already), it's still worth answering your question as is.

The [] operator of map types is a veritable swiss army knife. You can do a lot of things with it.

You can assign a value.
You can lookup the value.
You can lookup a value even if it doesn't exist yet.

Because of this, using that operator imposes some requirements for whatever type is stored in the map. In this specific case, you are running into the requirement that it has to be value-initializable, which references cannot be.

This applies to regular references as well by the way, not just function references.

So the answer is: You can store references in a map as much as you like as long as you don't use any function of the map that requires the reference to do something it's not allowed to.

In this case, you can use umap.at(0)(10.1);. One of the big differences between [] and at() is that if the key is not set yet, at() will throw an exception instead of creating a value-initialized value.

strange behaviour when vector elemnts are set using indexing

std::vector::reserve() is not the same as std::vector::resize().

b.reserve(a.size()) does not change b's size. You'll need to use b.resize(a.size()) for that.

vector<int> a{1,3,6,10,56,9};
vector<int> b;
b.resize(a.size());

or create b with the right size to start with.

vector<int> a{1,3,6,10,56,9};
vector<int> b(a.size());

Benefits of Using Reserve() in a Vector - C++