How Does Std::Vector Support Contiguous Memory for Custom Objects of Unknown Size

How does std::vector support contiguous memory for custom objects of unknown size

Your concept of size is flawed. A std::vector<type> has a compile time known size of space it is going to take up. It also has a run time size that it may use (this is allocated at run time and the vector holds a pointer to it). You can picture it laid out like

+--------+
|        |
| Vector |
|        |
|        |
+--------+
     |
     |
     v
+-------------------------------------------------+
|         |         |         |         |         |
| Element | Element | Element | Element | Element |
|         |         |         |         |         |
+-------------------------------------------------+

So when you have a vector of things that have a vector in them, each Element becomes the vector and then those point of to their own storage somewhere else like

+--------+
|        |
| Vector |
|        |
|        |
+----+---+
     |
     |
     v
+----+----+---------+---------+
| Object  | Object  | Object  |
|  with   |  with   |  with   |
| Vector  | Vector  | Vector  |
+----+----+----+----+----+----+
     |         |         |    +---------+---------+---------+---------+---------+
     |         |         |    |         |         |         |         |         |
     |         |         +--->+ Element | Element | Element | Element | Element |
     |         |              |         |         |         |         |         |
     |         |              +-------------------------------------------------+
     |         |    +-------------------------------------------------+
     |         |    |         |         |         |         |         |
     |         +--->+ Element | Element | Element | Element | Element |
     |              |         |         |         |         |         |
     |              +-------------------------------------------------+
     |    +-------------------------------------------------+
     |    |         |         |         |         |         |
     +--->+ Element | Element | Element | Element | Element |
          |         |         |         |         |         |
          +---------+---------+---------+---------+---------+

This way all of the vectors are next to each other, but the elements the vectors have can be anywhere else in memory. It is for this reason you don't want to use a std:vector<std::vector<int>> for a matrix. All of the sub vectors get memory to wherever so there is no locality between the rows.

Do note that this applies to all of the allocator aware containers as they do not store the elements inside the container directly. This is not true for std::array as, like a raw array, the elements are part of the container. If you have an std::array<int, 20> then it is at least sizeof(int) * 20 bytes in size.

What does std::vector look like in memory?

It roughly looks like this (excuse my MS Paint masterpiece):

vector memory layout

The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.

So it seems as though when I push_back() to the numbers vector, its older elements change their location.

The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.

Does it maybe store them together, but moves them all together, when more space is needed?

Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.

I am aware, that std::vector is a contiguous container only since C++17

The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.

std::vector and contiguous memory of multidimensional arrays

For reference the way I currently create a 2D array in a contiguous memory block is by first making a (dynamic) array of float* of length N, allocating all N*5 floats in one array and then copying the address of every 5th element into the first array of float*.

That's not a 2D array, that's an array of pointers. If you want a real 2D array, this is how it's done:

float (*p)[5] = new float[N][5];

p [0] [0] = 42;   // access first element
p[N-1][4] = 42;   // access last element

delete[] p;

Note there is only a single allocation. May I suggest reading more about using arrays in C++?

How memory works if vectors inside class array are dynamically push_backed?

Consider this example:

struct foo {
    std::vector<int> x;
    std::vector<int> y;
};

Now sizeof(foo) is a compile time constant. It does not change when you add elements to the vectors. Also sizeof(std::vector<int>) is constant.

The size of a foo instance is not increasing when the size() of the vector grows. It is similar to having a dynamic array (only for the sake of the example):

struct bar {
   int* c_array;
};

Here, sizeof(bar) is likely to be just sizeof(int*), because it is just a pointer, even though it could point to the first element of a c-style array, or to a single int.

Default vector memory size

It's implementation defined (i.e. may differ between multiple compilers).

The total amount of allocated memory can be queried using std::vector::capacity() function.

To read on, check out this post: size vs capacity of a vector?.

How are vectors(in C++) having elements of variable size allocated?

Every object type has a fixed size. This is what sizeof returns. A vector itself typically holds a pointer to the array of objects, the number of objects for which space has been allocated, and the number of objects actually contained. The size of these three things is independent of the number of elements in the vector.

For example, a vector<int> might contain:

1) An int * holding the address of the data.

2) A size_t holding the number of objects we've allocated space for

3) A size_t holding the number of objects contained in the vector.

This will probably be somewhere around 24 bytes, regardless of how many objects are in the vector. And this is what sizeof(vector<int>) will return.

Allocating a std::vector with custom allocator and type unknown at runtime

Just allocating a bunch of memory does not a vector make. Your calloc solution is wrong because it doesn't actually construct a vector, plus the memory you would construct it in has not been properly aligned. The fact that you've repeatedly observed this method "working" is pure bad luck; you can, in theory and in practice, expect it to catastrophically blow up.

Even if all you needed to do was allocate some memory, you still couldn't do it with a vector of unknown type because you don't know how large a vector<unknown-type> is. Certainly you can't construct one. Even your vector<T*>→vector<void*> idea doesn't work: you're forgetting that each vector specialisation is a completely distinct type, and relationships between value types do not create relationships between the resulting container types.

Your approach is too naive to successfully generate C++ objects in general. There is a reason that C++ gives us operator new and allocators that work in harmony. The old C-style approach of "give memory, use memory" just isn't enough. The fact that you then want to "send data to client" is a red flag too: you can't serialise complex C++ objects by dumping their immediate bytes to a network.

Finally, my obligatory warning that C++ is an abstraction, and that if you are trying to write "clever" source code that manipulates objects at the byte level, you need to be very careful. C++ expects you to literally tell it when you want an object to be created, and literally tell it when you want that object to be destroyed; failing to do so then pretending that the object exists is prone to all manner of weirdness, since compilers (including but not limited to during their optimisation phase) take full advantage of the complex, heavily semantic rules that make up the language. In general you really do need to "tell it what you mean" rather than trying to go behind its back. (Some limited exceptions to this rule exist, mostly leniencies surrounding objects of built-in type, and the ability to alias existing, properly-constructed objects using a char*; none of these exceptions apply to your requirement.)

Stick with the construction/allocation facilities provided, and write a serialiser.

Creating object on stack, from a class with dynamic members

How compiler determines the required size for "a", when it is not yet known which constructor is going to be called?

This question hinges on two misunderstandings:

First, Test a; does call the default constructor. Next, the size of objects to be allocated on the stack is a constant: sizeof(Test). This size does not change when more elements are added to the vector member, because those elements to not contribute to the size of a Test, they are allocated dynamically on the heap.

what about if i used "new" for object creation?

No difference concerning the size of the object. sizeof(Test) is a compile time constant and does not change depending on how you create the object. Though with Test* p = new Test; you'll have only p on the stack and its size is sizeof(Test*), while the object is on the heap.

PS: Actually "heap" and "stack" are not terms used by the standard, but as they are so common I think its ok to use them here. More formally correct would be to talk about automatic and dynamic storage duration (https://en.cppreference.com/w/cpp/language/storage_duration).

How Does Std::Vector Support Contiguous Memory for Custom Objects of Unknown Size