Dynamic Aligned Memory Allocation in C++11

Dynamic memory allocation with aligned storage

The C++11 standard requires that allocation functions such as ::operator new return memory that is aligned to alignof(std::max_align_t) [basic.stc.dynamic/2]:

The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement […]

Thus, as long as the type of the object you're creating via your new expression is not an overaligned type (one that requires an alignment more strict than alignof(std::max_align_t)), everything's fine. For overaligned types, you would indeed have to allocate storage of sufficient size to manually align the pointer, e.g., using std::align, and then construct your object at a suitable address, e.g., via placement new…

Starting with C++17, new automatically takes care of this. To allocate storage that requires alignment more strict than __STDCPP_DEFAULT_NEW_ALIGNMENT__ (the alignment that allocation functions are at least required to supply), a new-expression will call an allocation function that is explicitly given the alignment of the storage to be allocated as an argument [expr.new]/14 …

How to allocate aligned memory only using the standard library?

Original answer

{
    void *mem = malloc(1024+16);
    void *ptr = ((char *)mem+16) & ~ 0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

Fixed answer

{
    void *mem = malloc(1024+15);
    void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

Explanation as requested

The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long, double, long double, long long, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.)

The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.

The last step, free(), is easy: you always, and only, return to free() a value that one of malloc(), calloc() or realloc() returned to you — anything else is a disaster. You correctly provided mem to hold that value — thank you. The free releases it.

Finally, if you know about the internals of your system's malloc package, you could guess that it might well return 16-byte aligned data (or it might be 8-byte aligned). If it was 16-byte aligned, then you'd not need to dink with the values. However, this is dodgy and non-portable — other malloc packages have different minimum alignments, and therefore assuming one thing when it does something different would lead to core dumps. Within broad limits, this solution is portable.

Someone else mentioned posix_memalign() as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier.

One more comment — this code does not check that the allocation succeeded.

Amendment

Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.]

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

int main(void)
{
    void *mem = malloc(1024+15);
    void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
    return(0);
}

And here is a marginally more generalized version, which will work for sizes which are a power of 2:

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

static void test_mask(size_t align)
{
    uintptr_t mask = ~(uintptr_t)(align - 1);
    void *mem = malloc(1024+align-1);
    void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);
    assert((align & (align - 1)) == 0);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

int main(void)
{
    test_mask(16);
    test_mask(32);
    test_mask(64);
    test_mask(128);
    return(0);
}

To convert test_mask() into a general purpose allocation function, the single return value from the allocator would have to encode the release address, as several people have indicated in their answers.

Problems with interviewers

Uri commented: Maybe I am having [a] reading comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer?

My response won't fit into a 300-character comment...

It depends, I suppose. I think most people (including me) took the question to mean "How would you allocate a space in which 1024 bytes of data can be stored, and where the base address is a multiple of 16 bytes". If the interviewer really meant how can you allocate 1024 bytes (only) and have it 16-byte aligned, then the options are more limited.

Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful.
Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator.

However, if the interviewer expected either of those responses, I'd expect them to recognize that this solution answers a closely related question, and then to reframe their question to point the conversation in the correct direction. (Further, if the interviewer got really stroppy, then I wouldn't want the job; if the answer to an insufficiently precise requirement is shot down in flames without correction, then the interviewer is not someone for whom it is safe to work.)

The world moves on

The title of the question has changed recently. It was Solve the memory alignment in C interview question that stumped me. The revised title (How to allocate aligned memory only using the standard library?) demands a slightly revised answer — this addendum provides it.

C11 (ISO/IEC 9899:2011) added function aligned_alloc():

7.22.3.1 The aligned_alloc function

Synopsis
#include <stdlib.h>
void *aligned_alloc(size_t alignment, size_t size);
Description
The aligned_alloc function allocates space for an object whose alignment is
specified by alignment, whose size is specified by size, and whose value is
indeterminate. The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.

Returns
The aligned_alloc function returns either a null pointer or a pointer to the allocated space.

And POSIX defines posix_memalign():

#include <stdlib.h>

int posix_memalign(void **memptr, size_t alignment, size_t size);
DESCRIPTION

The posix_memalign() function shall allocate size bytes aligned on a boundary specified by alignment, and shall return a pointer to the allocated memory in memptr. The value of alignment shall be a power of two multiple of sizeof(void *).

Upon successful completion, the value pointed to by memptr shall be a multiple of alignment.

If the size of the space requested is 0, the behavior is implementation-defined; the value returned in memptr shall be either a null pointer or a unique pointer.

The free() function shall deallocate memory that has previously been allocated by posix_memalign().

RETURN VALUE

Upon successful completion, posix_memalign() shall return zero; otherwise, an error number shall be returned to indicate the error.

Either or both of these could be used to answer the question now, but only the POSIX function was an option when the question was originally answered.

Behind the scenes, the new aligned memory function do much the same job as outlined in the question, except they have the ability to force the alignment more easily, and keep track of the start of the aligned memory internally so that the code doesn't have to deal with specially — it just frees the memory returned by the allocation function that was used.

Dynamically allocate properly-aligned memory: is the new-expression on char arrays suitable?

I can't find where the standard guarantees your proposed code to work. First, I cannot find the part of the standard that backs what you've quoted from CppReference.com, but even if we take that claim on faith, it still only says that it may allocate additional space. If it doesn't, you're sunk.

The standard does speak to the alignment of memory returned by operator new[]: "The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type …." (C++03, §3.7.2.1/2; C++11, §3.7.4.1/2) However, in the context where you're planning to allocate the memory, the type you plan to store in it isn't a complete type. And besides, the result of operator new[] isn't necessarily the same as the result of the new-expression new char[…]; the latter is allowed to allocate additional space for its own array bookkeeping.

You could use C++11's std::align. To guarantee that you allocate space that can be aligned to the required amount, you'd have to allocate object_size() + object_alignment() - 1 bytes, but in practice, allocating only object_size() bytes will probably be fine. Thus, you might try using std::align something like this:

size_t buffer_size = object_size();
void* p = operator new(buffer_size);
void* original_p = p;
if (!std::align(object_alignment(), object_size(), p, buffer_size) {
  // Allocating the minimum wasn't enough. Allocate more and try again.
  operator delete(p);
  buffer_size = object_size() + object_alignment() - 1;
  p = operator new(buffer_size);
  original_p = p;
  if (!std::align(object_alignment(), object_size(), p, buffer_size)) {
    // still failed. invoke error-handler
    operator delete(p);
  }
}
object_construct(p);
object_destruct(p);
operator delete(original_p);

The allocator described in another question accomplishes much the same thing. It's templated on the type of object being allocated, which you don't have access to, but it's not required to be that way. The only times it uses its template type argument are to evaluate sizeof and alignof, which you already have from your object_size and object_alignment functions.

That seems like a lot to require from consumers of your library. It would be much more convenient for them if you moved the allocation behind the API as well:

void* object_construct() {
  return new internal_object();
}

Make sure to move the destruction, too, by calling delete, not just the destructor.

That makes any alignment questions go away because the only module that really needs to know it is the one that already knows everything else about the type being allocated.

How to create objects in C++ with dynamic alignment requirements?

You can do the following:

auto storage = static_cast<Foo*>(std::aligned_alloc(n * sizeof(Foo), alignment));
std::uninitialized_default_construct_n(storage, n);
auto ptr = std::launder(storage);

// use ptr to refer to Foo objects

std::destroy_n(storage, n);
free(storage);

Casting a pointer returned by std::aligned_alloc to Foo* is not undefined behavior. It would be UB if you tried to dereference it right after std::aligned_alloc before std::uninitialized_default_construct_n created Foo objects.

Edit.

The code above is technically undefined behavior. But it seems that in C++ there is no 100% standard-conforming way to do such an allocation without UB. From the practical point to view this code is reliable and safe. std::launder(storage) should probably be used to access Foo objects through storage pointer.

See this question for details and discussions.

why is there no aligned calloc in C11

The best guess I could offer is that an aligned_calloc specifically goes against one of the C1X charter's explicit goals:

Unlike for C9X, the consensus at the London meeting was that there should be no
invention, without exception. Only those features that have a history and are in common
use by a commercial implementation should be considered. Also there must be care to
standardize these features in a way that would make the Standard and the commercial
implementation compatible.

http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1250.pdf

Looking around at commercial implementations, aligned_malloc was widely available and common to most every platform. An aligned calloc would have required more than wrapping on many platforms to offer more than the aligned_malloc() + memset() pair, thus could be considered inventive and thus was left out.

That'd be my best guess.