How does the capacity of std::vector grow automatically? What is the rate?
The rate at which the capacity of a vector grows is implementation dependent. Implementations almost invariably choose exponential growth, in order to meet the amortized constant time requirement for the push_back
operation. What amortized constant time means and how exponential growth achieves this is interesting.
Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.
This probably seems a bit odd, so let me explain to you how this works...
- size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
- size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
- size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
- size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
- size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
- ...
- size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
- size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
- ...
- size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
- size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82
As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.
If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).
I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.
Why Vector's size() and capacity() is different after push_back()
The Standard mandates that std::vector<T>::push_back()
has amortized O(1)
complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.
Simple example: sequentially push_back
32 int
s into a std::vector<int>
. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.
Doing the formal analysis shows that you get O(N)
stores and copies for N
elements. This means amortized O(1)
complexity per push_back
(often only a store without a copy, sometimes a store and a sequence of copies).
Because of this expansion strategy, you will have size() < capacity()
most of the time. Lookup shrink_to_fit
and reserve
to learn how to control a vector's capacity in a more fine-grained manner.
Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).
What does std::vector look like in memory?
It roughly looks like this (excuse my MS Paint masterpiece):
The std::vector
instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.
So it seems as though when I
push_back()
to thenumbers
vector, its older elements change their location.
The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.
Does it maybe store them together, but moves them all together, when more space is needed?
Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector
only if no reallocation takes place.
I am aware, that
std::vector
is a contiguous container only since C++17
The memory layout of std::vector
hasn't changed since its first appearance in the Standard. ContiguousContainer
is just a "concept" that was added to differentiate contiguous containers from others at compile-time.
Remaining memory before relocating vector c++
std::vector
has methods for this case:
capacity() returns current number of elements, for which memory is allocated. So if the number of elements in vector would exceed this number, reallocation will happen
reserve() allocates memory for given number of elements. If given number is less than capacity()
, it does nothing. Otherwise, it reallocates to provide the required capacity.
Initial capacity of vector in C++
The standard doesn't specify what the initial capacity
of a container should be, so you're relying on the implementation. A common implementation will start the capacity at zero, but there's no guarantee. On the other hand there's no way to better your strategy of std::vector<int> iv; iv.reserve(2345);
so stick with it.
Related Topics
Downcasting Using the 'Static_Cast' in C++
Why Is the Template Argument Deduction Not Working Here
How to Take the Address of a Function Defined in Standard Library
Vector Going Out of Bounds Without Giving Error
Use 'Class' or 'Typename' For Template Parameters
Flags to Enable Thorough and Verbose G++ Warnings
Calculate the Factorial of an Arbitrarily Large Number, Showing All the Digits
Calling Pthread_Cond_Signal Without Locking Mutex
What Is Difference Between Instantiating an Object Using New Vs. Without
Why Does Pointer Decay Take Priority Over a Deduced Template
How to Implement Atoi Using Simd
C++ Redefinition Header Files (Winsock2.H)
Static Member Initialization in a Class Template
Accessing Arrays by Index[Array] in C and C++
I Don't Want My Excel Add-In to Return an Array (Instead I Need a Udf to Change Other Cells)