Double Alignment

Double alignment

An extended comment:

According to GCC documentation about -malign-double:

Aligning double variables on a two-word boundary produces code that runs somewhat faster on a Pentium at the expense of more memory.
On x86-64, -malign-double is enabled by default.
Warning: if you use the -malign-double switch, structures containing the above types are aligned differently than the published application binary interface specifications for the 386 and are not binary compatible with structures in code compiled without that switch.

A word here means i386 word which is 32 bits.

Windows uses 64-bit alignment of double values even in 32-bit mode, while SysV i386 ABI conformant Unices use 32-bit alignment. The 32-bit Windows API, Win32, comes from Windows NT 3.1, which, unlike current generation Windows versions, targeted Intel i386, Alpha, MIPS and even the obscure Intel i860. As native RISC systems like Alpha and MIPS require double values to be 64-bit aligned (otherwise hardware fault occurs), portability might have been the rationale behind the 64-bit alignment in the Win32 i386 ABI.

64-bit x86 systems, know also as AMD64 or x86-64, or x64, require double values to be 64-bit aligned otherwise a misalignment fault occurs and the hardware does an expensive "fix-up" which considreably slows down memory access. That's why double values are 64-bit aligned in all modern x86-64 ABIs (SysV and Win32).

Alignment when using a long double in a structure

First of all, your compiler is creating 32-bit output even though you have a 64-bit processor. Anyway, you're assuming that things need to be aligned to a boundary identical to their size, which isn't true in general. In particular, in your case, long doubles take up 12 bytes, but only need to be aligned to a 4-byte boundary. As such, after your single-byte char, the compiler inserts 3 bytes of padding to get to a 4-byte boundary, and then inserts your 12-byte long double. 1+3+12 is 16, so struct try is 16 bytes long.

Alignment of a struct with two doubles is 4 even though double is aligned to 8 (32bit)

The document that specifies how this should behave on your architecture is here, in the i386 System V psABI. (Current revision here, see also the x86 tag wiki). In it we can read that the required alignment of double is 4. But it has this interesting note:

The Intel386 architecture does not require doubleword alignment for double-precision values. Nevertheless, for data structure compatibility with other Intel architectures, compilers may provide a method to align double-precision values on doubleword boundaries.
A compiler that provides the doubleword alignment mentioned above can
generate code (data structures and function calling sequences) that do not
conform to the Intel386 ABI. Programs built with the doubleword alignment
facility can thus violate conformance to the Intel386 ABI. See ‘‘Aggregates
and Unions’’ below and ‘‘Function Calling Sequence’’ later in this chapter
for more information.

GCC doesn't want to violate the ABI for structs (where alignment is quite relevant) so it correctly uses the alignment of 4 for doubles inside structs.

This behavior is even contrary to the ISO C standard

ISO C is completely irrelevant here since __alignof__ is not part of any C standard. The compiler could do anything, like fetch a picture of a cat from the internet and show it to you, and that would be a behavior completely compliant with the C standard.

C11 does specify an _Alignof operator though. Interestingly enough, if we use the _Alignof operator that is part of the C11 standard, GCC reports other (correct) numbers:

$ cat foo.c
#include <stdio.h>
struct dd { double d1; double d2; };
int main()
{
    printf("%d\n", (int)__alignof__(double));
    printf("%d\n", (int)__alignof__(struct dd));
}
$ cc -m32 -o foo foo.c && ./foo
8
4
$ ed foo.c
[...]
$ cat foo.c
#include <stdio.h>
struct dd { double d1; double d2; };
int main()
{
    printf("%d\n", (int)_Alignof(double));
    printf("%d\n", (int)_Alignof(struct dd));
}
$ cc -m32 -o foo foo.c && ./foo
4
4

The wording of the C standard isn't very specific about what should happen in ABIs where types inside structs have lower alignment than they do outside.

After careful reading of the standard's wording, and some debate, gcc developers decided that _Alignof should tell you the minimum alignment that you will ever see for a value of that type in a strict C11 program (https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00435.html). (This is what you want for a use-case like writing a garbage collector that scans blocks of memory for potential pointers.) Note that C11 doesn't include __attribute__((packed)), and casting unaligned pointers is UB.

This mailing list post explains why they changed C11 _Alignof, but not C++ alignof or the GNU __alignof__ extension.

GNU C's __alignof__ continues to mean how gcc will align that type as a global or local, outside of a struct. i.e. the maximum/recommended alignment. The current version of the i386 SysV ABI doesn't say anything about aligning double to 8B ever; that's purely optional behaviour by current compilers for performance.

Having _Alignof(double) <= _Align(struct containing_double) appears to satisfy all the requirements in the C11 standard, even though the preferred alignment for double is 8B. double works when crossing a 4B boundary, it's just slow if it crosses a cache-line or page.

(But note that _Atomic long long doesn't "work" if it's not 8B aligned, so clang gives it 8B alignment even inside structs. Current gcc is broken for C11 stdatomic 8B types on the 32-bit SysV ABI, and will hopefully change to match clang.)

In clang, _Alignof seems to be the same as __alignof__. So it disagrees with gcc about the C11 operator (but not about struct layout, except for C11 stdatomic).

See some test cases on the Godbolt compiler explorer with gcc7.2 and clang4.0. Remove the -xc to compile as C++ instead of C

somewhat related: gcc7 increased the alignment of max_align_t in 32-bit from 8 to 16, for _Float128, but malloc(8) or strdup("abc") might still return only 8B-aligned blocks.

gcc's stddef.h implements max_align_t with a struct with members like

long long __max_align_ll __attribute__((__aligned__(__alignof__(long long))));

to make sure that the resulting struct really does have as large an alignment requirement (_Alignas) as its members. It also has long double and __float128 members.

Why double in C is 8 bytes aligned?

The reason to align a data value of size 2^N on a boundary of 2^N is to avoid the possibility that the value will be split across a cache line boundary.

The x86-32 processor can fetch a double from any word boundary (8 byte aligned or not) in at most two, 32-bit memory reads. But if the value is split across a cache line boundary, then the time to fetch the 2nd word may be quite long because of the need to fetch a 2nd cache line from memory. This produces poor processor performance unnecessarily. (As a practical matter, the current processors don't fetch 32-bits from the memory at a time; they tend to fetch much bigger values on much wider busses to enable really high data bandwidths; the actual time to fetch both words if they are in the same cache line, and already cached, may be just 1 clock).

A free consequence of this alignment scheme is that such values also do not cross page boundaries. This avoids the possibility of a page fault in the middle of an data fetch.

So, you should align doubles on 8 byte boundaries for performance reasons. And the compilers know this and just do it for you.

Double Alignment