Memory Layout C++ Objects

memory layout C++ objects

Each class lays out its data members in the order of declaration.

The compiler is allowed to place padding between members to make access efficient (but it is not allowed to re-order).

How dynamic_cast<> works is a compiler implementation detail and not defined by the standard. It will all depend on the ABI used by the compiler.

reinterpret_cast<> works by just changing the type of the object. The only thing that you can guarantee that works is that casting a pointer to a void* and back to the same the pointer to class will give you the same pointer.

memory layout of C++ object

Non-virtual member functions are extremely like regular non-member functions, with the only difference between them being a pointer to the class instance passed as a very first argument upon invocation.

This is done automatically by compiler, so (in pseudo-code) your call b.fun() can be compiled into

B::Fun(&b);

Where B::Fun can be seen as a usual function. The address of this function does not have to stored in actual object (all objects of this class will use the same function), and thus size of the class does not include it.

Where in the C++ Standard is the memory layout of objects documented?

[class.mem]/18:

Non-static data members of a (non-union) class with the same access control are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified. Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions and virtual base classes.

and [class.mem]/25:

If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member. Otherwise, its address is the same as the address of its first base class subobject (if any). [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. — end note ] [ Note: The object and its first subobject are pointer-interconvertible ([basic.compound], [expr.static.cast]). — end note ]

There is also [dcl.array] which indicates that arrays are contiguous in memory, [class.bit] which talks about bit-fields, and [intro.object] which talkes about object size and the concept of overlapping subobjects.

There may be other places. There's no one spot.

What does an object look like in memory?

Static class members are treated almost exactly like global variables / functions. Because they are not tied to an instance, there is nothing to discuss regarding memory layout.

Class member variables are duplicated for each instance as you can imagine, as each instance can have its own unique values for every member variable.

Class member functions only exist once in a code segment in memory. At a low level, they are just like normal global functions but they receive a pointer to this. With Visual Studio on x86, it's via ecx register using thiscall calling convention.

When talking about virtual functions, polymorphism, then the memory layout gets more complicated, introducing a "vtable" which is basically a bunch of function pointers that define the topography of the class instance.

What is the C++ memory layout of objects/structs etc?

Yes, the standard doesn't say how the objects are to be represented in memory. To get an idea how normall C++ objects are represented read this book: inside C++ object model.

Storage layout of C objects

No, I don't think that there is an explicit mention of the fact that we can't know anything about relative object layout. In fact, the C standard is even more radical than that, you are not even allowed to do comparison with the < operator on two variables that are not elements of the same array, nor may you do arithmetic between pointers to objects that are not part of the same array.

So the whole question of "layout" cannot even be formulate with the terminology that the C standard provides.

Why using memory layout for running a C program

Stack, heap, data and text are located in physical memory, not distinct from it. Memory is allocated for different purposes with different behaviour in terms of scope and persistence, and to facilitate that the linker segments (or divides up) the memory for different purposes.

In many embedded systems, the code (text segment) and constant data reside in ROM which is physically different from RAM. The linker needs to know where that ROM space is located in the memory map.

The stack is temporary space used for local data storage, function parameters and return call/function addresses. It is continuously used and reused as functions are called and variables go in and out of scope.

Heap is used for dynamic memory allocation through functions such as malloc() / free(). It is what memory is allocated from at runtime rather then being statically allocated or automatically allocated on the stack. Heap allocations persist until they are explicitly returned to the heap rather than having "scope" and being automatically instantiated and destroyed.

The data segment is where statically allocated data resides. This is where static and global data reside. Objects in this memory are instantiated are program start and persist for as long as the code is executing.

In practice there are generally two segments for static data, data and bss. data is for explicitly non-zero initialised data. They exist in read/write memory, but the initialiser values for this memmory are in text. When the program starts, the start-up code that runs before main() copies the initial values to the allocated RAM segment. The bss segment is simply initialised to zero - the default initial value for static data.

So:

bss and data must be distinct spaces to facilitate efficient initialisation.
text must be distinct because it is eother located and extecuted in-place in in ROM, or in systems where it is loaded in RAM, it will be done so most efficiently by copying a contiguous block of code to the run-time location.
heap is a run-time pool of memory. It is certainly possible to distribute the heap across non-contiguous memory, but in the simple case it will generally be a single contiguous block.
The stack concept is an artefact of how (most) microprocessors work at the machine level, so it is a natural model for a compiled language. The stack segment itself is the call/data stack used in the main() thread. Some processors switch to a separate stack for interrupt handling (some don't). If multi-threading is used, typically each thread has its own stack. These thread stacks may be instantiated dynamically from the heap or statically allocated in bss for example.

The point is that C code is compiled to object code and then linked to form the final binary executable. The linker is responsible for locating code and data so requires a memory map to know what to put where. The stack must be contiguous because that is how the machine works and it is required for local automatically created and destroyed data.

Do C++ Objects (Standard Definition) persist in memory map files?

No, objects do not persist this way.

C++ objects are defined primarily by their lifetime, which is scoped to the program.

So if you want to recycle an object from raw storage, there has to be a brand new object in program (2) with its own lifetime. reinterpret_cast'ing memory does not create a new object, so that doesn't work.

Now, you might think that inplace-newing an object with a trivial constructor at that memory location could do the trick:

struct MyObj {
  int x;
  int y;
  float z;
};

void foo(char* raw_data) {
  // The content of raw_data must be treated as being ignored.
  MyObj* obj = new (raw_data) MyObj();
}

But you can't do that either. The compiler is allowed to (and demonstrably does sometimes) assume that such a construction mangles up the memory. See C++ placement new after memset for more details, as well as a demonstration.

If you want to initialize an object from a given storage representation, you must use memcpy() or an equivalent:

void foo(char* raw_data) {
  MyObj obj;

  static_assert(std::is_standard_layout_v<MyObj>);
  std::memcpy(&obj, raw_data, sizeof(MyObj));
}

Addendum: It is possible to do the equivalent of the desired reinterpret_cast<> by restomping the memory with its original content after creating the object (inspired by the IOC proposal).

#include <type_traits>
#include <cstring>
#include <memory>

template<typename T> 
T* start_lifetime_as(void *p) 
  requires std::is_trivially_copyable_v<T> {
  
  constexpr std::size_t size = sizeof(T);
  constexpr std::size_t align = alignof(T);

  auto aligned_p = std::assume_aligned<align>(p);

  std::aligned_storage_t<size, align> tmp;
  std::memcpy(&tmp, aligned_p, size);

  T* t_ptr = new (aligned_p) T{};
  std::memcpy(t_ptr , &tmp, size);

  return std::launder<T>(t_ptr);
}


void foo(char* raw_data) {
  MyObj* obj = start_lifetime_as<MyObj>(raw_data);
}

This should be well-defined in C++11 and up as long as that memory location only contains raw data and no prior object. Also, from cursory testing, it seems like compilers do a good job at optimizing that away.

see on godbolt

Memory Layout C++ Objects