Vector Memory Allocation Strategy

vector memory allocation strategy

How the vector is growing is implementation defined. So different strategies can be used resulting in different capacity after inserting the same count of elements

If you need to rely on how many items are allocated you should use reserve and/or resize methods of vector

What is the memory/runtime efficiency of std::vector, and what is its memory allocation strategy?

why would it execute in such a way

Because even though this wastes some memory, it makes insertions faster, by reducing the number of reallocations.

It keeps push_back complexity at amortized O(1), while increasing the size by 1 and reallocating each time would make it O(n).

reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case

The standard just says that push_back has to have amortized O(1) complexity, and compilers (more precisely, standard library implementations) are free to achieve that by any means.

This disallows increasing capacity by 1 each time, as that would make the complexity O(n). Apparently achieving amortized O(1) requires the size to be increased by N times, but N can have any value. As show in the comments, N can wary between insertions, and is normally between 1.5 and 2.

memory/time efficiency of std::vector compared to built-in static and dynamic arrays?

Access speed should be nearly the same. Insertion and removal speed can't be compared because arrays have fixed size. Vectors might use more memory because, as you noticed, they can allocate memory for more elements that they actually have.

Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?

There's no need to replace fixed-size arrays with vectors, as long as static arrays are small enough to fit into the stack.

std::vector should be used instead of manually allocating arrays with new to reduce the risk of memory leaks and other bugs.

Initial capacity of vector in C++

The standard doesn't specify what the initial capacity of a container should be, so you're relying on the implementation. A common implementation will start the capacity at zero, but there's no guarantee. On the other hand there's no way to better your strategy of std::vector<int> iv; iv.reserve(2345); so stick with it.

Allocating ~10GB of vectors - how can I speed it up?

Is there something I could do to speed up the memory allocation? I tried allocating one large vector instead, but that failed with bad_malloc -- I guess a 10GB vector isn't ok.

I mainly wanted to respond by addressing this one part: bad_alloc exceptions tend to be misunderstood. They're not the result of "running out of memory" -- they're the result of the system failing to find a contiguous block of unused pages. You could have plenty more than enough memory available and still get a bad_alloc if you get in the habit of trying to allocate massive blocks of contiguous memory, simply because the system can't find a contiguous set of pages that are free. You can't necessarily avoid bad_alloc by "making sure plenty of memory is free" as you might have already seen where having over 100 gigabytes of RAM can still make you vulnerable to them when trying to allocate a mere 10 GB block. The way to avoid them is to allocate memory in smaller chunks instead of one epic array. At a large enough scale, structures like unrolled lists can start to offer favorable performance over a gigantic array and a much lower (exponentially) probability of ever getting a bad_alloc exception unless you actually do exhaust all the memory available. There is actually a peak where contiguity and the locality of reference it provides ceases to become beneficial and may actually hinder memory performance at a large enough size (mainly due to paging, not caching).

For the kind of epic scale input you're handling, you might actually get better performance out of std::deque given the page-friendly nature of it (it's one of the few times where deque can really shine without need for push_front vs. vector). It's something to potentially try if you don't need perfect contiguity.

Naturally it's best if you measure this with an actual profiler. It'll help us hone in on the actual problem, though it might not be completely shocking (surprising but maybe not shocking) that you might be bottlenecked by memory here instead of disk IO given the kind of "massive number of massive blocks" you're allocating (disk IO is slow but memory heap allocation can sometimes be expensive if you are really stressing the system). It depends a lot on the system's allocation strategy but even slab or buddy allocators can fall back to a much slower code branch if you allocate such epic blocks of memory and en masse, and allocations may even start to require something resembling a search or more access to secondary storage in those extreme cases (here I'm afraid I'm not sure exactly what goes on behind the hood when allocating so many massive blocks, but I have "felt" and measured these kinds of bottlenecks before but in a way where I never quite figured out what the OS was doing exactly -- this above paragraph is purely conjecture).

Here it's kind of counter-intuitive but you can often get better performance allocating a larger number of smaller blocks. Typically that makes things worse, but if we're talking about 3 million floats per memory block and a thousand memory blocks like it, it might help to start allocating in, say, page-friendly 4k chunks. Typically it's cheaper to pre-allocate memory in large blocks in advance and pool it, but "large" in this case is more like 4 kilobyte blocks, not 10 gigabyte blocks.

std::deque will typically do this kind of thing for you so it might be the quickest thing to try out to see if it helps. With std::deque, you should be able to make a single one for all 10 GB worth of contents without splitting it into smaller ones to avoid bad_alloc. It also doesn't have the zero-initialization overhead of the entire contents that some cited, and push_backs to it are constant-time even in the worst-case scenario (not amortized constant time as with std::vector), so I would try std::deque with actually push_back instead of pre-sizing it and using operator[]. You could read the file contents in small chunks at a time (ex: using 4k byte buffers) and just push back the floats. It's something to try anyway.

Anyway, these are all just educated guesses without code and profiling measurements, but these are some things to try out after your measurements.

MMFs may also be the ideal solution for this scenario. Let the OS handle all the tricky details of what it takes to access the file's contents.

How does Rust allocate more space for a vector?

The documentation on Vec says this:

Vec does not guarantee any particular growth strategy when
reallocating when full, nor when reserve is called. The current
strategy is basic and it may prove desirable to use a non-constant
growth factor. Whatever strategy is used will of course guarantee O(1)
amortized push.

The guarantee of amortized O(1) push implies that Rust does not reallocate on every push. It must look more like the second scenario that you described, where it allocates extra capacity to make room for additional elements to be pushed.

If we dig a little into the source code for the Rust standard library, we will see that in this situation (pushing one element at a time), the current implementation of Vec in fact doubles the capacity of the Vec on each reallocation. This exact strategy, of course, is not specified and could change in future implementations.

If you know from the start how big your Vec will need to be, you can construct it with Vec::with_capacity to avoid the need for any reallocations.

C++ : Vector Allocator behavior, memory allocation and smart pointers

Q: Is my understanding correct?

A: Your understanding is partially correct.

p1 and p2 are created on the stack, using the default no-argument constructor, that you've defined.
The Default allocator may be used to allocate more memory for p1 and p2 when you call push_back(), but will not always do so. It will never create default construct a new instance of Point though.

Q: If new Point objects are created by the Allocator, why I see only two lines of "Points created"?

A: New objects are not being created by the allocator - the allocator only allocates more memory, if needed. The objects that you insert in the vector are copy constructed. Because you have not created a copy constructor, the compiler has generated one for you.

Q: How does the Allocator assign original values to x,y fields of the newly created objects? Does it using raw memory copy?

A: As stated in the previous question, the allocator only allocates memory and does not create or destroy objects. The behavior of copying the fields is done by the copy constructor that gets invoked when you do a push_back. The automatically generated copy constructor will do a member-wise copy construction of each of the class' members. In your case, x and y are primitive types, so they'll be just raw memory copied. If members were complex objects, their copy constructors would be invoked.

Q: Is the shared pointer the recommended way to return a vector from a method?

A: This would depend on your use case, and is opinion based. My personal advice, and this is for all kind of objects is:

If your use cases allows it, return by value (that is, std::vector<Point> getPoints())
If you need dynamically allocated storage, or the object that you want to return can be nothing, because construction failed, return by std::unique_ptr. This applies to pretty much all factory functions that you might want to create. Even if you later want to share the ownership (see point 3), you can construct a shared_ptr by moving from a unique_ptr (std::shared_ptr<T> shared = std::move(unique) );
Avoid using shared_ptr unless you really need shared ownership of the ptr. shared_ptr are more complex to reason about, can create hard to debug cycles, leading to memory leaks, and are heavier in terms of performance (because of atomic operations relating to their refcount and additional allocated memory for a control block). If you think you need a shared_ptr, reconsider your design and think if you can use a unique_ptr instead.

How this works:

Internally, a std::vector is using memory allocated using the default allocator (or custom user provided, if you provided one) on the heap. This allocation happens behind the scenes, and is independent of the vector's size, and from the number of elements in the vector (but is always >= size()). You can get how many elements the vector has allocated storage for by using the capacity() function. When you call push_back(), what happens:

If there is enough storage (as determined by capacity()) to hold one more element, the argument that you passed to push_back is copy constructed, using the copy constructor if using the push_back( const T& value )variant or moved from if using push_back( T&& value ), by using the move constructor .
If there is no more memory (i.e the new size() > capacity), more memory is allocated that will be sufficient to hold the new elements. How much memory will be allocated is implementation defined. A common pattern is to double the amount of capacity that the vector previously had until a threshold, after which memory is allocated in buckets. You may use reserve() before inserting elements, to make sure that your vector will have enough capacity to hold at least as many elements without new allocations. After new memory has been allocated, the vector reallocates all existing elements into the new storage by either copying them, or moving them if they are not copy-insertable. This reallocation will invalidate all iterators and references to elements in the vector (caveat: the rules for when exactly copy vs move will be used when reallocating is a bit more complex, but this is the general case)

Is std::vector X(0) guaranteed not to allocate?

That constructor calls explicit vector( size_type count ) which does:

Constructs the container with count default-inserted instances of T.
No copies are made.

The only guarantee you get is that the vector will be empty, its size() will be 0. Implementations are allowed to allocate whatever they want for book keeping or whatever on initialization.

So if your question is if you can count on X taking up 0 bytes of free store space then the answer is no.

Memory issue with std::vector in c++

Your vector needs to successively reallocate more memory to keep growing. It does this by reserving a new, larger area of memory and copying the old data over. It depends on the implementation how much more memory is reserved, but a typical strategy is to allocate twice as much memory (libstdc++ does this).

This means that, in the worst case, your total memory requirement could be close to three times as much as your raw memory requirement:

Let’s say your vector currently holds 90,000,000 elements, and its capacity — by bad luck — is also 90,000,000¹. To insert the 90,000,001st element, std::vector now reserves twice as much memory — 180,000,000, copies all the old elements over, and then destructs the old array.

Therefore, even though you “only” need 100,000,000 elements, you briefly had to allocate storage for 270,000,000 elements. This corresponds to around 9.10 GiB, even though your 100M vector only requires 3.35 GiB.

This can be neatly avoided by putting the following line in front of your nested initialisation loop:

mesh_points_A.reserve(N * N);

¹ More realistically, the capacity is probably a power of two, e.g. 2²⁶ = 67,108,864; that’s still 6.75 GiB of memory for the resizing.

Vector Memory Allocation Strategy