How Does Malloc Understand Alignment

how does malloc understand alignment?

Alignment requirements are recursive: The alignment of any struct is simply the largest alignment of any of its members, and this is understood recursively.

For example, and assuming that each fundamental type's alignment equals its size (this is not always true in general), the struct X { int; char; double; } has the alignment of double, and it will be padded to be a multiple of the size of double (e.g. 4 (int), 1 (char), 3 (padding), 8 (double)). The struct Y { int; X; float; } has the alignment of X, which is the largest and equal to the alignment of double, and Y is laid out accordingly: 4 (int), 4 (padding), 16 (X), 4 (float), 4 (padding).

(All numbers are just examples and could differ on your machine.)

Therefore, by breaking it down to the fundamental types, we only need to know a handful of fundamental alignments, and among those there is a well-known largest. C++ even defines a type max_align_t whose alignment is that largest alignment.

All malloc() needs to do is to pick an address that's a multiple of that value.

Which guarantees does malloc make about memory alignment?

Accdording to this documentation page,

the address of a block returned by malloc or realloc in the GNU system is always a multiple of eight (or sixteen on 64-bit systems).

In general, malloc implementations are system-specific. All of them keep some memory for their own bookkeeping (e.g. the actual length of the allocated block) in order to be able to release that memory correctly when you call free. If you need to align to a specific boundary, use other functions, such as posix_memalign.

What is aligned memory allocation?

Alignment requirements specify what address offsets can be assigned to what types. This is completely implementation-dependent, but is generally based on word size. For instance, some 32-bit architectures require all int variables start on a multiple of four. On some architectures, alignment requirements are absolute. On others (e.g. x86) flouting them only comes with a performance penalty.

malloc is required to return an address suitable for any alignment requirement. In other words, the returned address can be assigned to a pointer of any type. From C99 §7.20.3 (Memory management functions):

The pointer returned if the allocation
succeeds is suitably aligned so that
it may be assigned to a pointer to any
type of object and then used to access
such an object or an array of such
objects in the space allocated (until
the space is explicitly deallocated).

Data alignment of structure using malloc

The question erroneously assumes that

struct st *p = malloc(sizeof(*p));

is the same as

struct st *p = malloc(13);

It is not. To test,

printf ("Size of st is %d\n", sizeof (*p));

which prints 24, not 13.

The proper way to allocate and manage structures is with sizeof(X), and not by assuming anything about how the elements are packed or aligned.

Why is malloc 16 byte aligned?

x86-64 System V uses x87 for long double, the 80-bit type. And pads it to 16-byte, with alignof(long double) == 16 so a long double will never cross a cache-line boundary. (Worth it or not, IDK; likely SSE2 was one of the motivations for supporting 16-byte alignment cheaply).

But anyway, SSE stuff isn't the only thing contributing to alignof(max_align_t) == 16 (which sets the minimum alignment that malloc is allowed to return).

The existence of__m128i doesn't directly contribute to max_align_t at all, for example 32-bit C implementations support it with lower malloc guarantees. Certainly the existence of __m256i on systems supporting AVX didn't increase the alignment guarantees for allocators. (How to solve the 32-byte-alignment issue for AVX load/store operations?). But certainly it's convenient for vectorization, both auto and manual, that malloced memory is aligned enough for movaps, especially on older CPUs when x86-64 was new and movups had penalties even when the memory was aligned. It's hard for a compiler to take advantage of that guarantee if it only sees a float*, you could have passed it a pointer into the middle of an allocation. But if it can see the malloc of an output array, it knows it will be aligned if auto-vectorizing a loop that writes to that newly malloced space.

BTW, ISO C would let malloc for a small allocation (like 1 to 15 bytes) return less-aligned space, since the space could still be used to hold any type that would fit. In C, an object can't require more alignment than its size. (e.g. you can't typedef an int that always has to be at the start of a cache line, or if you do the sizeof expands with padding.)

aligned malloc c++ implementation

Why sum twice the offset?

offset isn't exactly being summed twice. First use of offset is for the size to allocate:

void* p = (void * ) malloc(required_bytes + offset);

Second time is for the alignment:

void* q = (void * ) (((size_t)(p) + offset) & ~(alignment - 1));

Explanation:
~(alignment - 1) is a negation of offset (remember, int offset = alignment - 1;) which gives you the mask you need to satisfy the alignment requested. Arithmetic-wise, adding the offset and doing bitwise and (&) with its negation gives you the address of the aligned pointer.

How does this arithmetic work? First, remember that the internal call to malloc() is for required_bytes + offset bytes. As in, not the alignment you asked for. For example, you wanted to allocate 10 bytes with alignment of 16 (so the desired behavior is to allocate the 10 bytes starting in an address that is divisible with 16). So this malloc() from above will give you 10+16-1=25 bytes. Not necessarily starting at the right address in terms of being divisible with 16). But then this 16-1 is 0x000F and its negation (~) is 0xFFF0. And now we apply the bitwise and like this: p + 15 & 0xFFF0 which will cause every pointer p to be a multiple of 16.

But wait, why add this offset of alignment - 1 in the first place? You do it because once you get the pointer p returned by malloc(), the one thing you cannot do -- do in order to find the nearest address which is a multiple of the alignment requested -- is look for it before p, as this could cross into an address space of something allocated before p. For this, you begin by adding alignment - 1, which, think about it, is exactly the maximum by which you'd have to advance to get your alignment.

* Thanks to user DevSolar for some additional phrasing.

Note 1: For this way to work the alignment must be a power of 2. This snippet does not enforce such a thing and so could cause unexpected behavior.

Note 2: An interesting question is how could you implement a free() version for such an allocation, with the return value from this function.

How Does Malloc Understand Alignment