Strict Aliasing and Alignment

Best practices for object oriented patterns with strict aliasing and strict alignment in C

You can avoid the compiler warning by casting to void * first:

struct Derived1 *dp = (struct Derived1 *) (void *) bp;

(After the cast to void *, the conversion to struct Derived1 * is automatic in the above declaration, so you could remove the cast.)

The methods of using a pointer-to-a-pointer or a union to reinterpret a pointer are not correct; they violate the aliasing rule, as a struct Derived1 * and a struct Base * are not suitable types for aliasing each other. Do not use those methods.

(Due to C 2018 6.2.6.1 28, which says “… All pointers to structure types shall have the same representation and alignment requirements as each other…,” an argument can be made that reinterpreting one pointer-to-a-structure as another through a union is supported by the C standard. Footnote 49 says “The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.” At best, however, this is a kludge in the C standard and should be avoided when possible.)

Why does using a union type and/or pointer-to-pointer violate strict aliasing rules? In other words what is fundamentally different between what is done in main (taking address of b member) and what is done in print_val() other than the direction of the conversion? Both yield the same situation - two pointers that point to the same memory, which are different struct types - a struct Base* and a struct Derived1*.

It would seem to me that if this were violating strict aliasing rules in any way, the introduction of an intermediate void* cast would not change the fundamental problem.

The strict aliasing violation occurs in aliasing the pointer, not in aliasing the structure.

If you have a struct Derived1 *dp or a struct Base *bp and you use it to access a place in memory where there actually is a struct Derived1 or, respectively, a struct Base, then there is no aliasing violation because you are accessing an object through an lvalue of its type, which is allowed by the aliasing rule.

However, this question suggested aliasing a pointer. In *((struct Derived1 **)&bp);, &bp is the location where there is a struct Base *. This address of a struct Base * is converted to the address of a struct Derived1 **, and then * forms an lvalue of type struct Derived1 *. The expression is then used to access a struct Base * using a type of struct Derived1 *. There is no match for that in the aliasing rule; none of the types it lists for accessing a struct Base * are a struct Derived1 *.

C undefined behavior. Strict aliasing rule, or incorrect alignment?

The code indeed breaks the strict aliasing rule. However, there is not only an aliasing violation, and the crash doesn't happen because of the aliasing violation. It happens because the unsigned short pointer is incorrectly aligned; even the pointer conversion itself is undefined if the result is not suitably aligned.

C11 (draft n1570) Appendix J.2:

1 The behavior is undefined in the following circumstances:

....

  • Conversion between two pointer types produces a result that is incorrectly aligned (6.3.2.3).

With 6.3.2.3p7 saying

[...] If the resulting pointer is not correctly aligned [68] for the referenced type, the behavior is undefined. [...]

unsigned short has alignment requirement of 2 on your implementation (x86-32 and x86-64), which you can test with

_Static_assert(_Alignof(unsigned short) == 2, "alignof(unsigned short) == 2");

However, you're forcing the u16 *key2 to point to an unaligned address:

u16 *key2 = (u16 *) (keyc + 1);  // we've already got undefined behaviour *here*!

There are countless programmers that insist that unaligned access is guaranteed to work in practice on x86-32 and x86-64 everywhere, and there wouldn't be any problems in practice - well, they're all wrong.

Basically what happens is that the compiler notices that

for (size_t i = 0; i < len; ++i)
hash += key2[i];

can be executed more efficiently using the SIMD instructions if suitably aligned. The values are loaded into the SSE registers using MOVDQA, which requires that the argument is aligned to 16 bytes:

When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.

For cases where the pointer is not suitably aligned at start, the compiler will generate code that will sum the first 1-7 unsigned shorts one by one, until the pointer is aligned to 16 bytes.

Of course if you start with a pointer that points to an odd address, not even adding 7 times 2 will land one to an address that is aligned to 16 bytes. Of course the compiler will not even generate code that will detect this case, as "the behaviour is undefined, if conversion between two pointer types produces a result that is incorrectly aligned" - and ignores the situation completely with unpredictable results, which here means that the operand to MOVDQA will not be properly aligned, which will then crash the program.


It can be easily proven that this can happen even without violating any strict aliasing rules. Consider the following program that consists of 2 translation units (if both f and its caller are placed into one translation unit, my GCC is smart enough to notice that we're using a packed structure here, and doesn't generate code with MOVDQA):

translation unit 1:

#include <stdlib.h>
#include <stdint.h>

size_t f(uint16_t *keyc, size_t len)
{
size_t hash = len;
len = len / 2;

for (size_t i = 0; i < len; ++i)
hash += keyc[i];
return hash;
}

translation unit 2

#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <inttypes.h>

size_t f(uint16_t *keyc, size_t len);

struct mystruct {
uint8_t padding;
uint16_t contents[100];
} __attribute__ ((packed));

int main(void)
{
struct mystruct s;
size_t len;

srand(time(NULL));
scanf("%zu", &len);

char *initializer = (char *)s.contents;
for (size_t i = 0; i < len; i++)
initializer[i] = rand();

printf("out %zu\n", f(s.contents, len));
}

Now compile and link them together:

% gcc -O3 unit1.c unit2.c
% ./a.out
25
zsh: segmentation fault (core dumped) ./a.out

Notice that there is no aliasing violation there. The only problem is the unaligned uint16_t *keyc.

With -fsanitize=undefined the following error is produced:

unit1.c:10:21: runtime error: load of misaligned address 0x7ffefc2d54f1 for type 'uint16_t', which requires 2 byte alignment
0x7ffefc2d54f1: note: pointer points here
00 00 00 01 4e 02 c4 e9 dd b9 00 83 d9 1f 35 0e 46 0f 59 85 9b a4 d7 26 95 94 06 15 bb ca b3 c7
^

Memory alignment and strict aliasing for continuous block of raw bytes

First, it is unclear from your question, but I will assume that there is no other code inbetween the individual snippets you are showing.

Snippet 1. has undefined behavior because the pointer get will return cannot actually be pointing to a float object. ::operator new does implicitly create objects and return a pointer to a suitable created object, but that object would have to be a unsigned char object part of an unsigned char array in order to give the pointer arithmetic in reinterpret_cast<unsigned char*>(data) + bshift defined behavior.

However, then the return value of get<float>(sizeof(float)); would also be a pointer to an unsigned char object. Writing through a float glvalue to a unsigned char violates the aliasing rules.

This could be remedied by either using std::launder before returning the pointer from get or better by explicitly creating the object:

template<class T>
T* get(std::size_t bshift)
{
return new(reinterpret_cast<unsigned char*>(data) + bshift) T;
}

Although this will create a new object with indeterminate value each time it is called.

std::launder would be sufficient here without creating a new object since ::operator new can implicitly create an unsigned char array which provides storage for a float object which is also implicitly-created. (Assuming all objects used in this way fit in the storage, are correctly aligned (see below), do not overlap and are implicit-lifetime types):

template<class T>
T* get(std::size_t bshift)
{
return std::launder(reinterpret_cast<T*>(reinterpret_cast<unsigned char*>(data) + bshift));
}

However, 2. and 3. have undefined behavior even with this modification if alignof(float) != 1 (which is very likely to be true). You cannot create and start the lifetime of an object with wrong alignment either implicitly or explicitly. (Although it may technically be possible to create an object with wrong alignment explicitly without starting its lifetime.)

For 4. and 5., assuming the above weren't undefined behavior due to misalignment or out-of-bounds access and given the assumptions in the question, I think these snippets should have defined behavior. Note however that it is extremely unlikely that these requirements are satisfied.

For 6., if you offset everything you need to again take care not to go out-of-bounds of the allocation and not to violate the alignment of any of the involved types. (Including the alignment of POD_struct for 5.)

For 7. the formulation is very vague, so I am not sure what you mean. But you explicitly generally can't interpret memory as a different type than it is. In your examples 4. and 5. you are copying object representations, which is different.


To be clear again: Practically speaking your code has UB due to the alignment violations. This probably extends even to platforms that allow unaligned access, because the compiler may optimize code based on the assumption of pointer alignment. Compilers may offer type annotations to indicate that a pointer may be unaligned. You need to use these or other tools if you want to implement unaligned access in practice.

strict aliasing and memory alignment

Just disable alias-based optimization and call it a day

If your problems are in fact caused by optimizations related to strict aliasing, then -fno-strict-aliasing will solve the problem. Additionally, in that case, you don't need to worry about losing optimization because, by definition, those optimizations are unsafe for your code and you can't use them.

Good point by Praetorian. I recall one developer's hysteria prompted by the introduction of alias analysis in gcc. A certain Linux kernel author wanted to (A) alias things, and (B) still get that optimization. (That's an oversimplification but it seems like -fno-strict-aliasing would solve the problem, not cost much, and they all must have had other fish to fry.)

Does casting to a char pointer to increment a pointer by a certain amount and then accessing as a different type violate strict aliasing?

Does casting to a char pointer to increment a pointer by a certain amount and then accessing as a different type violate strict aliasing?

Not inherently so.

Normally, accessing an int * casted from a char * violates strict aliasing rules

Not necessarily. Strict aliasing is about the (effective) type of the pointed-to object. It is quite possible for the object to which a char * points to be an int, or compatible with int, or to be assigned effective type int as a consequence of the (write) access. In such cases, casting to int * and dereferencing the result is perfectly valid.

There are, yes, lots of cases in which casting a char * to an int * and then dereferencing the result would constitute a strict-aliasing violation, but it is not specifically because of the involvement of, or the casting to or from, type char *.

The above applies regardless of how the particular char * value was obtained, so in your particular example case, too. If the result of your pointer computation is a valid pointer, and the object to which it points is genuinely an (effective) int or is compatible with int in one of the specific ways documented in section 6.5 of the language spec, then reading the pointed-to value via the pointer is fine. Otherwise, it is a strict-aliasing violation.

Attempting to dereference a pointer value that is not correctly aligned for its type is a potential issue in general with pointer manipulation, but the strict aliasing rule is stronger than and effectively inclusive of pointer alignment considerations. If you have an access that satisfies the strict aliasing rule then the pointer involved must be satisfactorily aligned for its type. The reverse is not necessarily true.


Do note, however, that although on many platforms, your align16() will indeed attempt to perform a read of a 16-byte-aligned object, the C language specifications do not require that to be so. Pointer-to-integer and integer-to-pointer conversions are explicitly allowed, but their results are implementation defined. It is not necessarily the case that value on the integer side of such a conversion reports on or controls the alignment of the pointer on the other side.

How does the standard deal with such case, accessing a pointer modified while casted to a uintptr_t?

See above. Pointer-to-integer and integer-to-pointer conversions have implementation-defined effect as far as the language spec is concerned. However, on most implementations you're likely to meet, your two versions of align16() will have equivalent behavior.

Strict aliasing, and alignment questions when partitioning memory

Yes, the program is UB. When you do this:

for(std::size_t i = 0; i < array_count; ++i) {
p32[i] = i;
p16[i] = i * 2;
}

There are no uint32_t or uint16_t objects that p32 or p16 point to. calloc just gives you bytes, not objects. You can't just reinterpret_cast objects into existence. On top of that, indexing is only defined for arrays, and p32 does not point to an array.

To make it well defined, you'd have to create an array object. However, placement-new for arrays is broken, so you're left with manually initializing a bunch of uint32_ts like:

auto p32 = reinterpret_cast<uint32_t*>(pv);
for (int i = 0; i < array_count; ++i) {
new (p32+i) uint32_t; // NB: this does no initialization, but it does satisfy
// [intro.object] in actually creating an object
}

This would then run into a separate issue: CWG 2182. Now we have array_count uint32_ts, but we don't have a uint32_t[array_count] so indexing is still UB. Basically, there's just no way in purely-by-the-letter-of-the-standard C++ to write this. See also my similar question on the topic.


That said, the amount of code that does this in the wild is tremendous and every implementation will allow you to do it.

What is the strict aliasing rule?

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

typedef struct Msg
{
unsigned int a;
unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
// Get a 32-bit buffer from the system
uint32_t* buff = malloc(sizeof(Msg));

// Alias that buffer through message
Msg* msg = (Msg*)(buff);

// Send a bunch of messages
for (int i = 0; i < 10; ++i)
{
msg->a = i;
msg->b = i+1;
SendWord(buff[0]);
SendWord(buff[1]);
}
}

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 71 is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

void SendMessage(uint32_t* buff, size_t size32)
{
for (int i = 0; i < size32; ++i)
{
SendWord(buff[i]);
}
}

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
msg->a = i;
msg->b = i+1;
SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

  • Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.

      union {
    Msg msg;
    unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
    };
  • You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))

  • You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

1 The types that C 2011 6.5 7 allows an lvalue to access are:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.


Related Topics



Leave a reply



Submit