how does malloc understand alignment?
Alignment requirements are recursive: The alignment of any struct
is simply the largest alignment of any of its members, and this is understood recursively.
For example, and assuming that each fundamental type's alignment equals its size (this is not always true in general), the struct X { int; char; double; }
has the alignment of double
, and it will be padded to be a multiple of the size of double (e.g. 4 (int), 1 (char), 3 (padding), 8 (double)). The struct Y { int; X; float; }
has the alignment of X
, which is the largest and equal to the alignment of double
, and Y
is laid out accordingly: 4 (int), 4 (padding), 16 (X), 4 (float), 4 (padding).
(All numbers are just examples and could differ on your machine.)
Therefore, by breaking it down to the fundamental types, we only need to know a handful of fundamental alignments, and among those there is a well-known largest. C++ even defines a type max_align_t
whose alignment is that largest alignment.
All malloc()
needs to do is to pick an address that's a multiple of that value.
Which guarantees does malloc make about memory alignment?
Accdording to this documentation page,
the address of a block returned by malloc or realloc in the GNU system is always a multiple of eight (or sixteen on 64-bit systems).
In general, malloc
implementations are system-specific. All of them keep some memory for their own bookkeeping (e.g. the actual length of the allocated block) in order to be able to release that memory correctly when you call free
. If you need to align to a specific boundary, use other functions, such as posix_memalign
.
What is aligned memory allocation?
Alignment requirements specify what address offsets can be assigned to what types. This is completely implementation-dependent, but is generally based on word size. For instance, some 32-bit architectures require all int
variables start on a multiple of four. On some architectures, alignment requirements are absolute. On others (e.g. x86) flouting them only comes with a performance penalty.
malloc
is required to return an address suitable for any alignment requirement. In other words, the returned address can be assigned to a pointer of any type. From C99 §7.20.3 (Memory management functions):
The pointer returned if the allocation
succeeds is suitably aligned so that
it may be assigned to a pointer to any
type of object and then used to access
such an object or an array of such
objects in the space allocated (until
the space is explicitly deallocated).
Data alignment of structure using malloc
The question erroneously assumes that
struct st *p = malloc(sizeof(*p));
is the same as
struct st *p = malloc(13);
It is not. To test,
printf ("Size of st is %d\n", sizeof (*p));
which prints 24
, not 13.
The proper way to allocate and manage structures is with sizeof(X)
, and not by assuming anything about how the elements are packed or aligned.
Why is malloc 16 byte aligned?
x86-64 System V uses x87 for long double
, the 80-bit type. And pads it to 16-byte, with alignof(long double) == 16
so a long double will never cross a cache-line boundary. (Worth it or not, IDK; likely SSE2 was one of the motivations for supporting 16-byte alignment cheaply).
But anyway, SSE stuff isn't the only thing contributing to alignof(max_align_t) == 16
(which sets the minimum alignment that malloc is allowed to return).
The existence of__m128i
doesn't directly contribute to max_align_t
at all, for example 32-bit C implementations support it with lower malloc guarantees. Certainly the existence of __m256i
on systems supporting AVX didn't increase the alignment guarantees for allocators. (How to solve the 32-byte-alignment issue for AVX load/store operations?). But certainly it's convenient for vectorization, both auto and manual, that malloced memory is aligned enough for movaps
, especially on older CPUs when x86-64 was new and movups
had penalties even when the memory was aligned. It's hard for a compiler to take advantage of that guarantee if it only sees a float*
, you could have passed it a pointer into the middle of an allocation. But if it can see the malloc
of an output array, it knows it will be aligned if auto-vectorizing a loop that writes to that newly malloced space.
BTW, ISO C would let malloc
for a small allocation (like 1 to 15 bytes) return less-aligned space, since the space could still be used to hold any type that would fit. In C, an object can't require more alignment than its size. (e.g. you can't typedef an int
that always has to be at the start of a cache line, or if you do the sizeof expands with padding.)
aligned malloc c++ implementation
Why sum twice the offset?
offset
isn't exactly being summed twice. First use of offset is for the size to allocate:
void* p = (void * ) malloc(required_bytes + offset);
Second time is for the alignment:
void* q = (void * ) (((size_t)(p) + offset) & ~(alignment - 1));
Explanation:~(alignment - 1)
is a negation of offset
(remember, int offset = alignment - 1;
) which gives you the mask you need to satisfy the alignment requested. Arithmetic-wise, adding the offset and doing bitwise and (&
) with its negation gives you the address of the aligned pointer.
How does this arithmetic work? First, remember that the internal call to malloc()
is for required_bytes + offset
bytes. As in, not the alignment you asked for. For example, you wanted to allocate 10 bytes with alignment of 16 (so the desired behavior is to allocate the 10 bytes starting in an address that is divisible with 16). So this malloc()
from above will give you 10+16-1
=25 bytes. Not necessarily starting at the right address in terms of being divisible with 16). But then this 16-1
is 0x000F
and its negation (~
) is 0xFFF0
. And now we apply the bitwise and like this: p + 15 & 0xFFF0
which will cause every pointer p
to be a multiple of 16.
But wait, why add this offset of alignment - 1
in the first place? You do it because once you get the pointer p
returned by malloc()
, the one thing you cannot do -- do in order to find the nearest address which is a multiple of the alignment requested -- is look for it before p
, as this could cross into an address space of something allocated before p
. For this, you begin by adding alignment - 1
, which, think about it, is exactly the maximum by which you'd have to advance to get your alignment.
* Thanks to user DevSolar for some additional phrasing.
Note 1: For this way to work the alignment must be a power of 2. This snippet does not enforce such a thing and so could cause unexpected behavior.
Note 2: An interesting question is how could you implement a free()
version for such an allocation, with the return value from this function.
Related Topics
C++ Vector, What Happens Whenever It Expands/Reallocate on Stack
Converting Std::String to Std::Vector<Char>
How to Check String Start in C++
Debugging Information Cannot Be Found or Does Not Match Visual Studio'S
Why Do Un-Named C++ Objects Destruct Before the Scope Block Ends
Is C++11's Long Long Really at Least 64 Bits
How to Make an Array with a Dynamic Size? General Usage of Dynamic Arrays (Maybe Pointers Too)
Cleaning Up an Stl List/Vector of Pointers
A Good Example for Boost::Algorithm::Join
C++ Socket Server - Unable to Saturate Cpu
Returning to Beginning of File After Getline
Convert a Single Character to a String
On Local and Global Static Variables in C++
Why Is There No Reallocation Functionality in C++ Allocators