Why the Libc++ Std::Vector Internally Keeps Three Pointers Instead of One Pointer and Two Sizes

Why the libc++ std::vector internally keeps three pointers instead of one pointer and two sizes?

It's because the rationale is that performance should be optimized for iterators, not indices.

(In other words, performance should be optimized for begin()/end(), not size()/operator[].)

Why? Because iterators are generalized pointers, and thus C++ encourages their use, and in return ensures that their performance matches those of raw pointers when the two are equivalent.

To see why it's a performance issue, notice that the typical for loop is as follows:

for (It i = items.begin(); i != items.end(); ++i)
...

Except in the most trivial cases, if we kept track of sizes instead of pointers, what would happen is that the comparison i != items.end() would turn into i != items.begin() + items.size(), taking more instructions than you'd expect. (The optimizer generally has a hard time factoring out the code in many cases.) This slows things down dramatically in a tight loop, and hence this design is avoided.

(I've verified this is a performance problem when trying to write my own replacement for std::vector.)


Edit: As Yakk pointed out in the comments, using indices instead of pointers can also result in the generation of a multiplication instruction when the element sizes aren't powers of 2, which is pretty expensive and noticeable in a tight loop. I didn't think of this when writing this answer, but it's a phenomenon that's bitten me before (e.g. see here)... bottom line is, in a tight loop everything matters.

One of them is not like the others: why is std::vector::size implemented in terms of pointers on all major standard library implementations?

why is std::vector::size implemented in terms of pointers on all major standard library implementations?

Because it can be implemented in terms of pointer subtraction. And because the standard library implementers chose to do that.

Is is something in the standard?

No. I'm sure that it would be standard conforming to store the size as a member.

What is weird is that all other standard containers (except std::deque) store the size as a data member.

This is hardly surprising. No other data structure can use pointers as iterators besides an array.

std::string is implemented as an array, so it could also use pointer subtraction. If it isn't done so, it is because the implementers chose to not do so. This may, or might not be due to convenience related to small string optimisation.

How does vectors in C++ use memory?

As addressed in the comments by Kamil, a std::vector keeps track of three pointers internally. One pointer to the begin, one to end and one to the end of allocated memory (see stack post). Now, the size of a pointer should be 8 bytes on any 64-bit C/C++ compiler so, 3 * 8 bytes = 24 bytes (see wiki).

Why does the implementation of std::any use a function pointer + function op codes, instead of a pointer to a virtual table + virtual calls?

Consider a typical use case of a std::any: You pass it around in your code, move it dozens of times, store it in a data structure and fetch it again later. In particular, you'll likely return it from functions a lot.

As it is now, the pointer to the single "do everything" function is stored right next to the data in the any. Given that it's a fairly small type (16 bytes on GCC x86-64), any fits into a pair of registers. Now, if you return an any from a function, the pointer to the "do everything" function of the any is already in a register or on the stack! You can just jump directly to it without having to fetch anything from memory. Most likely, you didn't even have to touch memory at all: You know what type is in the any at the point you construct it, so the function pointer value is just a constant that's loaded into the appropriate register. Later, you use the value of that register as your jump target. This means there's no chance for misprediction of the jump because there is nothing to predict, the value is right there for the CPU to consume.

In other words: The reason that you get the jump target for free with this implementation is that the CPU must have already touched the any in some way to obtain it in the first place, meaning that it already knows the jump target and can jump to it with no additional delay.

That means there really is no indirection to speak of with the current implementation if the any is already "hot", which it will be most of the time, especially if it's used as a return value.

On the other hand, if you use a table of function pointers somewhere in a read-only section (and let the any instance point to that instead), you'll have to go to memory (or cache) every single time you want to move or access it. The size of an any is still 16 bytes in this case but fetching values from memory is much, much slower than accessing a value in a register, especially if it's not in a cache. In a lot of cases, moving an any is as simple as copying its 16 bytes from one location to another, followed by zeroing out the original instance. This is pretty much free on any modern CPU. However, if you go the pointer table route, you'll have to fetch from memory every time, wait for the reads to complete, and then do the indirect call. Now consider that you'll often have to do a sequence of calls on the any (i.e. move, then destruct) and this will quickly add up. The problem is that you don't just get the address of the function you want to jump to for free every time you touch the any, the CPU has to fetch it explicitly. Indirect jumps to a value read from memory are quite expensive since the CPU can only retire the jump operation once the entire memory operation has finished. That doesn't just include fetching a value (which is potentially quite fast because of caches) but also address generation, store forwarding buffer lookup, TLB lookup, access validation, and potentially even page table walks. So even if the jump address is computed quickly, the jump won't retire for quite a long while. In general, "indirect-jump-to-address-from-memory" operations are among the worst things that can happen to a CPU's pipeline.

TL;DR: As it is now, returning an any doesn't stall the CPU's pipeline (the jump target is already available in a register so the jump can retire pretty much immediately). With a table-based solution, returning an any will stall the pipeline twice: Once to fetch the address of the move function, then another time to fetch the destructor. This delays retirement of the jump quite a bit since it'll have to wait not only for the memory value but also for the TLB and access permission checks.

Code memory accesses, on the other hand, aren't affected by this since the code is kept in microcode form anyway (in the µOp cache). Fetching and executing a few conditional branches in that switch statement is therefore quite fast (and even more so when the branch predictor gets things right, which it almost always does).

The operation of the sizeof operator in C++

The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?

You're confusing sizeof(std::vector) and std::vector::size(), the former will return the size of vector itself, not including the size of elements it holds. The latter will return the count of the elements, you can get all their size by std::vector::size() * sizeof(int).

so why is it 16? And why 16 and not another number?

What is sizeof(std::vector) depends on implmentation, mostly implemented with three pointers. For some cases (such as debug mode) the size might increase for the convenience.

C++ sizeof Vector is 24?

While the public interface of std::vector is defined by the standard, there can be different implementations: in other words, what's under the hood of std::vector can change from implementation to implementation.

Even in the same implementation (for example: the STL implementation that comes with a given version of Visual C++), the internals of std::vector can change from release builds and debug builds.

The 24 size you see can be explained as 3 pointers (each pointer is 8 bytes in size on 64-bit architectures; so you have 3 x 8 = 24 bytes). These pointers can be:

  • begin of vector
  • end of vector
  • end of reserved memory for vector (i.e. vector's capacity)

How can I propagate const when returning a std::vectorint* from a const method?

You're asking for std::experimental::propagate_const. But since it is an experimental feature, there is no guarantee that any specific toolchain is shipped with an implementation. You may consider implementing your own. There is an MIT licensed implementation, however. After including the header:

using namespace xpr=std::experimental;
///...
std::vector<xpr::propagate_const<int*>> my_ptr_vec;

Note however that raw pointer is considered evil so you may need to use std::unique_ptr or std::shared_ptr. propagate_const is supposed to accept smart pointers as well as raw pointer types.

What is the memory/runtime efficiency of std::vector, and what is its memory allocation strategy?

why would it execute in such a way

Because even though this wastes some memory, it makes insertions faster, by reducing the number of reallocations.

It keeps push_back complexity at amortized O(1), while increasing the size by 1 and reallocating each time would make it O(n).

reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case

The standard just says that push_back has to have amortized O(1) complexity, and compilers (more precisely, standard library implementations) are free to achieve that by any means.

This disallows increasing capacity by 1 each time, as that would make the complexity O(n). Apparently achieving amortized O(1) requires the size to be increased by N times, but N can have any value. As show in the comments, N can wary between insertions, and is normally between 1.5 and 2.

memory/time efficiency of std::vector compared to built-in static and dynamic arrays?

Access speed should be nearly the same. Insertion and removal speed can't be compared because arrays have fixed size. Vectors might use more memory because, as you noticed, they can allocate memory for more elements that they actually have.

Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?

There's no need to replace fixed-size arrays with vectors, as long as static arrays are small enough to fit into the stack.

std::vector should be used instead of manually allocating arrays with new to reduce the risk of memory leaks and other bugs.



Related Topics



Leave a reply



Submit