How Does the Capacity of Std::Vector Grow Automatically? What Is the Rate

How does the capacity of std::vector grow automatically? What is the rate?

The rate at which the capacity of a vector grows is implementation dependent. Implementations almost invariably choose exponential growth, in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.

Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.

This probably seems a bit odd, so let me explain to you how this works...

size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
...
size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
...
size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82

As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.

If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).

I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.

Why Vector's size() and capacity() is different after push_back()

The Standard mandates that std::vector<T>::push_back() has amortized O(1) complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.

Simple example: sequentially push_back 32 ints into a std::vector<int>. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.

Doing the formal analysis shows that you get O(N) stores and copies for N elements. This means amortized O(1) complexity per push_back (often only a store without a copy, sometimes a store and a sequence of copies).

Because of this expansion strategy, you will have size() < capacity() most of the time. Lookup shrink_to_fit and reserve to learn how to control a vector's capacity in a more fine-grained manner.

Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).

What does std::vector look like in memory?

It roughly looks like this (excuse my MS Paint masterpiece):

vector memory layout

The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.

So it seems as though when I push_back() to the numbers vector, its older elements change their location.

The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.

Does it maybe store them together, but moves them all together, when more space is needed?

Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.

I am aware, that std::vector is a contiguous container only since C++17

The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.

Remaining memory before relocating vector c++

std::vector has methods for this case:

capacity() returns current number of elements, for which memory is allocated. So if the number of elements in vector would exceed this number, reallocation will happen

reserve() allocates memory for given number of elements. If given number is less than capacity(), it does nothing. Otherwise, it reallocates to provide the required capacity.

Initial capacity of vector in C++

The standard doesn't specify what the initial capacity of a container should be, so you're relying on the implementation. A common implementation will start the capacity at zero, but there's no guarantee. On the other hand there's no way to better your strategy of std::vector<int> iv; iv.reserve(2345); so stick with it.

How Does the Capacity of Std::Vector Grow Automatically? What Is the Rate