Memory Alignment in C-Structs

Structure padding and packing

Padding aligns structure members to "natural" address boundaries - say, int members would have offsets, which are mod(4) == 0 on 32-bit platform. Padding is on by default. It inserts the following "gaps" into your first structure:

struct mystruct_A {
    char a;
    char gap_0[3]; /* inserted by compiler: for alignment of b */
    int b;
    char c;
    char gap_1[3]; /* -"-: for alignment of the whole struct in an array */
} x;

Packing, on the other hand prevents compiler from doing padding - this has to be explicitly requested - under GCC it's __attribute__((__packed__)), so the following:

struct __attribute__((__packed__)) mystruct_A {
    char a;
    int b;
    char c;
};

would produce structure of size 6 on a 32-bit architecture.

A note though - unaligned memory access is slower on architectures that allow it (like x86 and amd64), and is explicitly prohibited on strict alignment architectures like SPARC.

Memory alignment in C-structs

At least on most machines, a type is only ever aligned to a boundary as large as the type itself [Edit: you can't really demand any "more" alignment than that, because you have to be able to create arrays, and you can't insert padding into an array]. On your implementation, short is apparently 2 bytes, and int 4 bytes.

That means your first struct is aligned to a 2-byte boundary. Since all the members are 2 bytes apiece, no padding is inserted between them.

The second contains a 4-byte item, which gets aligned to a 4-byte boundary. Since it's preceded by 6 bytes, 2 bytes of padding is inserted between v3 and i, giving 6 bytes of data in the shorts, two bytes of padding, and 4 more bytes of data in the int for a total of 12.

Data Alignment for C struct

The answer depends on the compiler, platform and compile options. Some examples:
Sample Image

https://godbolt.org/z/4tAzB_

The author of the book does not understand the topic I afraid.

Structure memory alignment in C

The compiler pads structure members to keep things aligned properly for each data type.

See this link for information on how structure padding actually works. There is a lot of very detailed information there. See also this link for alignment information specific to the intel processor for the various data types.

In essence, because you have a 10-byte char array followed by an int, the compiler has padded the char array with an extra 2 bytes so that the int will be aligned properly on a 4-byte boundary (that is, an address evenly divisible by 4).

It is as if you declared your structure like this:

struct a
{
    char arr[10];
    char _padding[2];
    int i;
    float b;
};

Out of habit, I usually allocate char arrays with sizes that are evenly divisible by 4. That way the compiler doesn't have to do it for me, and it makes it easier to visualize what the data structure looks like in memory.

x86 Memory Alignment of struct vs. cache line?

I think the part you might be missing is the alignment requirement that the compiler imposes for various types.

Integer types are generally aligned to a multiple of their own size (e.g. a 64-bit integer will be aligned to 8 bytes); so-called "natural alignment". This is not a strict architectural requirement of x86; unaligned loads and stores still work, but since they are less efficient, the compiler prefers to avoid them.

An aggregate, like a struct, is aligned according to the highest alignment requirement of its members, and padding will be inserted between members if needed to ensure that each one is properly aligned. Padding will also be added at the end so that the overall size of the struct is a multiple of its required alignment.

So in your example, struct Obj has alignment 8, and its size will be rounded up to 48 (with 6 bytes of padding at the end). So there is no need for 24 bytes of padding to be inserted after c[4] (I think you meant to write the padding at addresses 40-63); your obj can be placed at address 40. d can then be placed at address 88.

Note that none of this has anything to do with the cache line size. Objects are not by default aligned to cache lines, though "natural alignment" will ensure that no integer load or store ever has to cross a cache line.

sizeof(), alignment in C structs:

Your struct must be 8*N bytes long, since it has a member with 8 bytes (double). That means the struct sits in the memory at an address (A) divisible by 8 (A%8 == 0), and its end address will be (A + 8N) which will also be divisible by 8.

From there, you store 2 4-bytes variables (int + float) meaning you now occupy the memory area [A,A+8). Now you store an 8-byte variable (double). There is no need for padding since (A+8) % 8 == 0 [since A%8 == 0]. So, with no padding you get the 4+4+8 == 16.

If you change the order to int -> double -> float you'll occupy 24 bytes since the double variable original address will not be divisible by 8 and it will have to pad 4 bytes to get to a valid address (and also the struct will have padding at the end).

|--------||--------||--------||--------||--------||--------||--------||--------|
|   each ||   cell ||  here  ||represen||-ts  4  || bytes  ||        ||        |
|--------||--------||--------||--------||--------||--------||--------||--------|

A        A+4       A+8      A+12      A+16      A+20      A+24                      [addresses]
|--------||--------||--------||--------||--------||--------||--------||--------|    
|   int  ||  float || double || double ||        ||        ||        ||        |    [content - basic case]
|--------||--------||--------||--------||--------||--------||--------||--------|

first padding to ensure the double sits on address that is divisble by 8
last  padding to ensure the struct size is divisble by the largest member's size (8)
|--------||--------||--------||--------||--------||--------||--------||--------|    
|   int  || padding|| double || double || float  || padding||        ||        |    [content - change order case]
|--------||--------||--------||--------||--------||--------||--------||--------|

memory alignment into structure - alignment size equal to largest member size

The OVERALL alignment of the structure should be that of the element with the greatest alignment requirement. This is required for the purpose of ensuring that, for example, an array of structures is always aligned. If you didn't have that, the size of struct { int x; char c; }; would have the first element aligned, but the next three would have x unaligned.

It is often possible to convince the compiler to generate a "packed" data structure (with no alignment padding) and use that to get a packed array, but it's a bad idea to use that in all but very special cases, because at best it's slower, at worst it causes execution to stop due to "unaligned access trap" in the processor.

If the size of int is four bytes [it is for all compilers I'm aware of - long is either 4 or 8 bytes, depends on the compiler], both on a 32- and 64-bit (at least x86) will be 4 byte aligned.

If you want to have a 7 byte "gap" in a struct, this would work:

struct X { 
   char c;
   uint64_t x;
};

which would of course have:

struct X { 
   char c;
   char padding[7]; 
   uint64_t x;
};

Memory Alignment in C/C++

The examples given in the book are highly dependent on the used compiler and computer architecture. If you test them in your own program you may get totally different results than the author. I will assume a 64-bit architecture, because the author does also, from what I've read in the description.
Lets look at the examples one by one:

ReallySlowStruct
IF the used compiler supports non-byte aligned struct members, the start of "d" will be at the seventh bit of the first byte of the struct. Sounds very good for memory saving. The problem with this is, that C does not allow bit-adressing. So to save newValue to the "d" member, the compiler must do a whole lot of bit shifting operations: Save the first two bits of "newValue" in byte0, shifted 6 bits to the right. Then shift "newValue" two bits to the left and save it starting at byte 1. Byte 1 is a non-aligned memory location, that means the bulk memory transfer instructions won't work, the compiler must save every byte at a time.

SlowStruct
It gets better. The compiler can get rid of all the bit-fiddling. But writing "d" will still require writing every byte at a time, because it is not aligned to the native "int" size. The native size on a 64-bit system is 8. so every memory address not divisable by 8 can only be accessed one byte at a time. And worse, if I switch off packing, I will waste a lot of memory space: every member which is followed by an int will be padded with enough bytes to let the integer start at a memory location divisable by 8. In this case: char a and c will both take up 8 bytes.

FastStruct
this is aligned to the size of int on the target machine. "d" takes up 8 bytes as it should. Because the chars are all bundled at one place, the compiler does not pad them and does not waste space. chars are only 1 byte each, so we do not need to pad them. The complete structure adds up to an overall size of 16 bytes. Divisable by 8, so no padding needed.

In most scenarios, you never have to be concerned with alignment because the default alignment is already optimal. In some cases however, you can achieve significant performance improvements, or memory savings, by specifying a custom alignment for your data stuctures.

In terms of memory space, the compiler pads the structure in a way that naturally aligns each element of the structure.

struct x_
{
   char a;     // 1 byte
   int b;      // 4 bytes
   short c;    // 2 bytes
   char d;     // 1 byte
} bar[3];

struct x_ is padded by the compiler and thus becomes:

// Shows the actual memory layout
struct x_
{
   char a;           // 1 byte
   char _pad0[3];    // padding to put 'b' on 4-byte boundary
   int b;            // 4 bytes
   short c;          // 2 bytes
   char d;           // 1 byte
   char _pad1[1];    // padding to make sizeof(x_) multiple of 4
} bar[3];

Source: https://docs.microsoft.com/en-us/cpp/cpp/alignment-cpp-declarations?view=vs-2019

Struct alignment may waste memory？

All instances of st2 have the same layout. Where the object is stored may not affect that layout. Whether the object is a member or not may not affect that layout. The compiler cannot simply pick one of the members of mySt and store it outside of the object.

Consider passing st2 into a function:

void some_function(st2&);

That function cannot know where that object comes from. But it has to be able to access the members of the object. The layout of the object has to be known at compile time. If one instance of st2 would have different layout from another, how would the function know?

Data alignment of structure using malloc

The question erroneously assumes that

struct st *p = malloc(sizeof(*p));

is the same as

struct st *p = malloc(13);

It is not. To test,

printf ("Size of st is %d\n", sizeof (*p));

which prints 24, not 13.

The proper way to allocate and manage structures is with sizeof(X), and not by assuming anything about how the elements are packed or aligned.