Is There Any Guarantee of Alignment of Address Return by C++'s New Operation

Is there any guarantee of alignment of address return by C++'s new operation?

The alignment has the following guarantee from the standard (3.7.3.1/2):

The pointer returned shall be suitably aligned so that it can be converted to a
pointer of any complete object type and then used to access the object or array in the
storage allocated (until
the storage is explicitly deallocated by a call to a corresponding deallocation function).

EDIT: Thanks to timday for highlighting a bug in gcc/glibc where the guarantee does not hold.

EDIT 2: Ben's comment highlights an intersting edge case. The requirements on the allocation routines are for those provided by the standard only. If the application has it's own version, then there's no such guarantee on the result.

Do malloc and new return an int-aligned address?

Yes, both malloc in C and C++ and new char[N] in C++ return a pointer that is maximally aligned. The alignment is that of the type max_align_t from or .

gcc-4.8 has a bug in this regard, fixed in gcc-4.9. Just include and use ::maxalign_t without the std:: prefix.

Support for over-aligned types, i.e. those whose alignment is greater than the above maximum, is implementation-defined. E.g. Posix offers posix_memalign to allocate memory with much greater alignments (e.g. page-aligned).

C++ new is 64-byte aligned and equal to cache line size

The implementation choices for a heap manager make a lot more sense when you consider what happens after a large number of allocations and deallocations.

A call to malloc() needs to locate a block of unused block of sufficient size to allocate.
It could be bigger (in which case, it could either create a free block with the difference - or waste it). A naive strategy of finding the closest size of block is called best fit. If it goes onto to create new free blocks, you could alternatively call it worst leave.

After use, the best-fit approach results in a large amounts of fragmentation, caused by small blocks that are unlikely to be ever allocated again, and the cost of searching the free blocks becomes high.

Consequently, high performance heap managers don't work like this. Instead they operate as pool allocators for various fixed block-sizes. Schemes in which the blocks are powers of 2 (e.g. 64,128,256,512...) the norm, although throwing in some intermediates is probably worthwhile too (e.g. 48,96,192...). In this scheme, malloc() and free() are both O(1) operations, and the critical sections in allocation are minimal - potentially per pool - which gets important in a multi-threaded environment.

The wasting of memory in small allocations is a much lesser evil than fragmentation, O(n) alloc\dealloc complexity and poor MT performance.

The minimum block size w.r.t. to the cache line size is one of those classic engineering trade-offs, and it's a safe bet that Microsoft did quite a bit of experimentation to arrive at 64 as their minimum. FWIW, I'm pretty sure you'll find the cache-line size of modern CPUs are bigger than that.

Is it guaranteed that array elements in C will be stored consecutively, with no padding?

Yes, it is guaranteed. If padding bytes are added, they are added within struct some_type, but not in between two array elements.

E. g.:

struct S
{
    int n;
    short s;

// this is just for illustration WHERE byte padding (typically) would occur!!!
#if BYTE_ALIGNMENT >= 4
    unsigned char : 0;
    unsigned char : 0;
#endif
};
struct S s[2];
size_t d = (char*)(s + 1) - (char*)s;

With byte alignment adjusted to 4 or 8 (or even larger powers of 2), this struct will have size of 8 and d will be equally 8, with byte alignment set to 1 or 2, the struct will have size of 6 just as will be d...

Note: This is not the only place where padding bytes can occur: If you switched members n and s, padding bytes would be needed in between s and n to get n correctly aligned. On the other hand, no padding bytes would be necessary after n any more as the structure size would assure correct alignment already.

Referring to the standard: C11, 6.2.5.20:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. 36) Array types are characterized by their element type and by the number of elements in the array. [...]

(Highlighting by me!).

C++: Alignment when casting byte buffer to another type

Your example is violation of the strict aliasing rule.
So, int64_view anyway will point to the first byte, but it can be unaligned access. Some platforms allow it, some not. Anyway, in C++ it's UB.

For example:

#include <cstdint>
#include <cstddef>
#include <iostream>
#include <iomanip>

#define COUNT 8

struct alignas(1) S
{
    char _pad;
    char buf[COUNT * sizeof(int64_t)];
};

int main()
{
    S s;
    int64_t* int64_view alignas(8) = static_cast<int64_t*>(static_cast<void*>(&s.buf));

    std::cout << std::hex << "s._pad     at " << (void*)(&s._pad) << " aligned as " << alignof(s._pad)     << std::endl;
    std::cout << std::hex << "s.buf      at " << (void*)(s.buf)   << " aligned as " << alignof(s.buf)      << std::endl;
    std::cout << std::hex << "int64_view at " << int64_view       << " aligned as " << alignof(int64_view) << std::endl;

    for(std::size_t i = 0; i < COUNT; ++i)
    {
        int64_view[i] = i;
    }

    for(std::size_t i = 0; i < COUNT; ++i)
    {
        std::cout << std::dec << std::setw(2) << i << std::hex << " " << int64_view + i << " : " << int64_view[i] << std::endl;
    }
}

Now compile and run it with -fsanitize=undefined:

$ g++ -fsanitize=undefined -Wall -Wextra -std=c++20 test.cpp -o test

$ ./test
s._pad     at 0x7ffffeb42300 aligned as 1
s.buf      at 0x7ffffeb42301 aligned as 1
int64_view at 0x7ffffeb42301 aligned as 8
test.cpp:26:23: runtime error: store to misaligned address 0x7ffffeb42301 for type 'int64_t', which requires 8 byte alignment
0x7ffffeb42301: note: pointer points here
 7f 00 00  bf 11 00 00 00 00 00 00  ff ff 00 00 01 00 00 00  20 23 b4 fe ff 7f 00 00  7c a4 9d 2b 98
              ^ 
test.cpp:31:113: runtime error: load of misaligned address 0x7ffffeb42301 for type 'int64_t', which requires 8 byte alignment
0x7ffffeb42301: note: pointer points here
 7f 00 00  bf 00 00 00 00 00 00 00  00 01 00 00 00 00 00 00  00 02 00 00 00 00 00 00  00 03 00 00 00
              ^ 
 0 0x7ffffeb42301 : 0
 1 0x7ffffeb42309 : 1
 2 0x7ffffeb42311 : 2
 3 0x7ffffeb42319 : 3
 4 0x7ffffeb42321 : 4
 5 0x7ffffeb42329 : 5
 6 0x7ffffeb42331 : 6
 7 0x7ffffeb42339 : 7

It works on x86_64, but there is undefined behavior and you pay with execution speed.

This example on godbolt

In C++20 there is bit_cast. It will not help you in this example with unaligned access, but it can resolve some issues with aliasing.

UPDATE:
There is instructions on x86_64, that requires aligned access. For example, SSE, that requires 16-bit alignment. If you will try to use these instructions with unaligned access, application will crash with "general protection fault".

Why are address are not consecutive when allocating single bytes?

glibc's malloc, for small memory allocations less than 16 bytes, simply allocates the memory as 16 bytes. This is to prevent external fragmentation upon the freeing of this memory, where blocks of free memory are too small to be used in the general case to fulfill new malloc operations.

A block allocated by malloc must also be large enough to store the data required to track it in the data structure which stores free blocks.

This behaviour, while increasing internal fragmentation, decreases overall fragmentation throughout the system.

Source:
http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c
(Read line 108 in particular)

/*
...
Minimum allocated size: 4-byte ptrs:  16 bytes    (including 4 overhead)
...
*/

Furthermore, all addresses returned by the malloc call in glibc are aligned to: 2 * sizeof(size_t) bytes. Which is 64 bits for 32-bit systems (such as yours) and 128 bits for 64-bit systems.

Is There Any Guarantee of Alignment of Address Return by C++'s New Operation