Struct Member Alignment - How to Assume No Padding

struct member alignment - is it possible to assume no padding

On systems that actually offer those types, it is highly likely to work. On, say, a 36-bit system those types would not be available in the first place.

GCC provides an attribute

__attribute__ ((packed))

With similar effect.

Structure padding and packing

Padding aligns structure members to "natural" address boundaries - say, int members would have offsets, which are mod(4) == 0 on 32-bit platform. Padding is on by default. It inserts the following "gaps" into your first structure:

struct mystruct_A {
    char a;
    char gap_0[3]; /* inserted by compiler: for alignment of b */
    int b;
    char c;
    char gap_1[3]; /* -"-: for alignment of the whole struct in an array */
} x;

Packing, on the other hand prevents compiler from doing padding - this has to be explicitly requested - under GCC it's __attribute__((__packed__)), so the following:

struct __attribute__((__packed__)) mystruct_A {
    char a;
    int b;
    char c;
};

would produce structure of size 6 on a 32-bit architecture.

A note though - unaligned memory access is slower on architectures that allow it (like x86 and amd64), and is explicitly prohibited on strict alignment architectures like SPARC.

How do I organize members in a struct to waste the least space on alignment?

(Don't apply these rules without thinking. See ESR's point about cache locality for members you use together. And in multi-threaded programs, beware false sharing of members written by different threads. Generally you don't want per-thread data in a single struct at all for this reason, unless you're doing it to control the separation with a large alignas(128). This applies to atomic and non-atomic vars; what matters is threads writing to cache lines regardless of how they do it.)

Rule of thumb: largest to smallest alignof(). There's nothing you can do that's perfect everywhere, but by far the most common case these days is a sane "normal" C++ implementation for a normal 32 or 64-bit CPU. All primitive types have power-of-2 sizes.

Most types have alignof(T) = sizeof(T), or alignof(T) capped at the register width of the implementation. So larger types are usually more-aligned than smaller types.

Struct-packing rules in most ABIs give struct members their absolute alignof(T) alignment relative to the start of the struct, and the struct itself inherits the largest alignof() of any of its members.

Put always-64-bit members first (like double, long long, and int64_t). ISO C++ of course doesn't fix these types at 64 bits / 8 bytes, but in practice on all CPUs you care about they are. People porting your code to exotic CPUs can tweak struct layouts to optimize if necessary.
then pointers and pointer-width integers: size_t, intptr_t, and ptrdiff_t (which may be 32 or 64-bit). These are all the same width on normal modern C++ implementations for CPUs with a flat memory model.
Consider putting linked-list and tree left/right pointers first if you care about x86 and Intel CPUs. Pointer-chasing through nodes in a tree or linked list has penalties when the struct start address is in a different 4k page than the member you're accessing. Putting them first guarantees that can't be the case.
then long (which is sometimes 32-bit even when pointers are 64-bit, in LLP64 ABIs like Windows x64). But it's guaranteed at least as wide as int.
then 32-bit int32_t, int, float, enum. (Optionally separate int32_t and float ahead of int if you care about possible 8 / 16-bit systems that still pad those types to 32-bit, or do better with them naturally aligned. Most such systems don't have wider loads (FPU or SIMD) so wider types have to be handled as multiple separate chunks all the time anyway).
ISO C++ allows int to be as narrow as 16 bits, or arbitrarily wide, but in practice it's a 32-bit type even on 64-bit CPUs. ABI designers found that programs designed to work with 32-bit int just waste memory (and cache footprint) if int was wider. Don't make assumptions that would cause correctness problems, but for "portable performance" you just have to be right in the normal case.
People tuning your code for exotic platforms can tweak if necessary. If a certain struct layout is perf-critical, perhaps comment on your assumptions and reasoning in the header.
then short / int16_t
then char / int8_t / bool
(for multiple bool flags, especially if read-mostly or if they're all modified together, consider packing them with 1-bit bitfields.)

(For unsigned integer types, find the corresponding signed type in my list.)

A multiple-of-8 byte array of narrower types can go earlier if you want it to. But if you don't know the exact sizes of types, you can't guarantee that int i + char buf[4] will fill an 8-byte aligned slot between two doubles. But it's not a bad assumption, so I'd do it anyway if there was some reason (like spatial locality of members accessed together) for putting them together instead of at the end.

Exotic types: x86-64 System V has alignof(long double) = 16, but i386 System V has only alignof(long double) = 4, sizeof(long double) = 12. It's the x87 80-bit type, which is actually 10 bytes but padded to 12 or 16 so it's a multiple of its alignof, making arrays possible without violating the alignment guarantee.

And in general it gets trickier when your struct members themselves are aggregates (struct or union) with a sizeof(x) != alignof(x).

Another twist is that in some ABIs (e.g. 32-bit Windows if I recall correctly) struct members are aligned to their size (up to 8 bytes) relative to the start of the struct, even though alignof(T) is still only 4 for double and int64_t.

This is to optimize for the common case of separate allocation of 8-byte aligned memory for a single struct, without giving an alignment guarantee. i386 System V also has the same alignof(T) = 4 for most primitive types (but malloc still gives you 8-byte aligned memory because alignof(maxalign_t) = 8). But anyway, i386 System V doesn't have that struct-packing rule, so (if you don't arrange your struct from largest to smallest) you can end up with 8-byte members under-aligned relative to the start of the struct.

Most CPUs have addressing modes that, given a pointer in a register, allow access to any byte offset. The max offset is usually very large, but on x86 it saves code size if the byte offset fits in a signed byte ([-128 .. +127]). So if you have a large array of any kind, prefer putting it later in the struct after the frequently used members. Even if this costs a bit of padding.

Your compiler will pretty much always make code that has the struct address in a register, not some address in the middle of the struct to take advantage of short negative displacements.

Eric S. Raymond wrote an article The Lost Art of Structure Packing. Specifically the section on Structure reordering is basically an answer to this question.

He also makes another important point:

9. Readability and cache locality
While reordering by size is the simplest way to eliminate slop, it’s not necessarily the right thing. There are two more issues: readability and cache locality.

In a large struct that can easily be split across a cache-line boundary, it makes sense to put 2 things nearby if they're always used together. Or even contiguous to allow load/store coalescing, e.g. copying 8 or 16 bytes with one (unaliged) integer or SIMD load/store instead of separately loading smaller members.

Cache lines are typically 32 or 64 bytes on modern CPUs. (On modern x86, always 64 bytes. And Sandybridge-family has an adjacent-line spatial prefetcher in L2 cache that tries to complete 128-byte pairs of lines, separate from the main L2 streamer HW prefetch pattern detector and L1d prefetching).

Fun fact: Rust allows the compiler to reorder structs for better packing, or other reasons. IDK if any compilers actually do that, though. Probably only possible with link-time whole-program optimization if you want the choice to be based on how the struct is actually used. Otherwise separately-compiled parts of the program couldn't agree on a layout.

(@alexis posted a link-only answer linking to ESR's article, so thanks for that starting point.)

Skip/avoid alignment padding bytes when calculating struct checksum

Is there a generic way to skip/avoid alignment padding bytes when calculating the checksum of a C struct?

There is no such mechanism on which a strictly conforming program can rely. This follows from

the fact that C implementations are permitted to lay out structures with arbitrary padding following any member or members, for any reason or none, and
the fact that

When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation
that correspond to any padding bytes take unspecified values.
(C2011, 6.2.6.1/6)

The former means that the standard provides no conforming way to guarantee that a structure layout contains no padding, and the latter means that in principle, there is nothing you can do to control the values of the padding bytes -- even if you initially zero-fill a structure instance, any padding takes indeterminate values as soon as you assign to that object or to any of its members.

In practice, it is likely that any of the approaches you mention in the question will do the job where the C implementation and the nature of the data permit. But only (2), computing the checksum member by member, can be used by a strictly-conforming program, and that one is not "generic" as I take you to mean that term. This is what I would choose. If you have many distinct structures that require checksumming, then it might be worthwhile to deploy a code generator or macro magic to help you with maintaining things.

On the other hand, your most reliable way to provide for generic checksumming is to exercise an implementation-specific extension that enables you to avoid structures containing any padding (your (1)). Note that this will tie you to a specific C implementation or implementations that implement such an extension compatibly, that it may not work at all on some systems (such as those where misaligned access is a hard error), and that it may reduce performance on other systems.

Your (4) is an alternative way to avoid padding, but it would be a portability and maintenance nightmare. Nevertheless, it could provide for generic checksumming, in the sense that the checksum algorithm wouldn't need to pay attention to individual members. But note also that this also places a requirement for initialization behavior analogous to (3). That would come cheaper, but it would not be altogether automatic.

In practice, C implementations do not wantonly modify padding bytes, but they don't necessarily go out of their way to preserve them, either. In particular, even if you zero-filled rigorously, per your (3), padding is not guaranteed to be copied by whole-structure assignment or when you pass or return a structure by value. If you want to do any of those things then you need to take measures on the receiving side to ensure zero-filling, and requires member-by-member attention.

Why is the compiler adding padding to a struct that's already 4-byte aligned?

From the GCC documentation:

Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question.

In other words, because your struct contains members of type uint64_t (aligned to 8 bytes), the struct itself must be aligned to an 8-byte boundary.

Assumption of structure padding in C

However, this does build on the assumption that the first member of struct will be stored immediately after word boundary. Is it always so?

Yes.

When a structure type is defined, the alignment requirement of the structure will be at least the strictest alignment requirement of its members. For example, if a structure has members with alignment requires of 1 byte, 8 bytes, and 4 bytes, the alignment requirement of the structure will be 8 bytes. The compiler will figure this out automatically when the structure is defined. (Technically, the C standard might permit the compiler to give the structure an even greater alignment—I do not see any rule against it—but that is not done in practice.)

Then, whenever the C implementation reserves memory for a structure object (as when you define an object of that type, such as struct foo x), it will ensure the memory is aligned as required for that structure. That results in the alignment requirements of the members being satisfied too. When a program allocates memory with malloc, the returned memory is always aligned as necessary for any object of the requested size.

(If you do any “funny stuff” in a program to set your own memory locations for objects, such as placing one in the middle of memory allocated with malloc, you are responsible for getting the alignment right.)

Further, the structure will be padded at the end if necessary so that its total size is a multiple of that alignment requirement. Then, in an array of those structures, each successive element of the array will begin at a properly aligned location too.

Actual sizeof padded struct, (sum of member sizes, without padding for alignment)

You can #include the same header twice. Once with packed enabled and once without. And obviously the macro would also change the name of the packed struct to something different to the real struct name.

Here's an example. The header file, say test.h, is shown below. It shows two struct which have different unpacked sizes but same packed sizes.

#ifdef ENABLE_PACKED
#define PACKED(x) __attribute__ ((__packed__)) x##_packed
#else
#define PACKED(x) x
#endif

struct PACKED(my_struct1) {
    char c1;
    int i1;
    char c2;
    int i2;
};

struct PACKED(my_struct2) {
    char c1;
    char c2;
    int i1;
    int i2;
};

#ifdef ENABLE_PACKED
_Static_assert(sizeof(struct my_struct1_packed) ==
               sizeof(struct my_struct2_packed), "Error");
#endif

#undef PACKED

Note that you really only need to define ENABLE_PACKED and #include test.h in one file for the static assert to be tested. So you can even just create a dummy.c file which includes test.h twice and have your build compile dummy.c every time but not actually use it in any real release object. That way, all your real .c files do not even need to know about this at all and can just include all header files normally.

Why isn't sizeof for a struct equal to the sum of sizeof of each member?

This is because of padding added to satisfy alignment constraints. Data structure alignment impacts both performance and correctness of programs:

Mis-aligned access might be a hard error (often SIGBUS).
Mis-aligned access might be a soft error.
- Either corrected in hardware, for a modest performance-degradation.
- Or corrected by emulation in software, for a severe performance-degradation.
- In addition, atomicity and other concurrency-guarantees might be broken, leading to subtle errors.

Here's an example using typical settings for an x86 processor (all used 32 and 64 bit modes):

struct X
{
    short s; /* 2 bytes */
             /* 2 padding bytes */
    int   i; /* 4 bytes */
    char  c; /* 1 byte */
             /* 3 padding bytes */
};

struct Y
{
    int   i; /* 4 bytes */
    char  c; /* 1 byte */
             /* 1 padding byte */
    short s; /* 2 bytes */
};

struct Z
{
    int   i; /* 4 bytes */
    short s; /* 2 bytes */
    char  c; /* 1 byte */
             /* 1 padding byte */
};

const int sizeX = sizeof(struct X); /* = 12 */
const int sizeY = sizeof(struct Y); /* = 8 */
const int sizeZ = sizeof(struct Z); /* = 8 */

One can minimize the size of structures by sorting members by alignment (sorting by size suffices for that in basic types) (like structure Z in the example above).

IMPORTANT NOTE: Both the C and C++ standards state that structure alignment is implementation-defined. Therefore each compiler may choose to align data differently, resulting in different and incompatible data layouts. For this reason, when dealing with libraries that will be used by different compilers, it is important to understand how the compilers align data. Some compilers have command-line settings and/or special #pragma statements to change the structure alignment settings.