What Is the Strict Aliasing Rule

What is the strict aliasing rule?

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

typedef struct Msg
{
unsigned int a;
unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
// Get a 32-bit buffer from the system
uint32_t* buff = malloc(sizeof(Msg));

// Alias that buffer through message
Msg* msg = (Msg*)(buff);

// Send a bunch of messages
for (int i = 0; i < 10; ++i)
{
msg->a = i;
msg->b = i+1;
SendWord(buff[0]);
SendWord(buff[1]);
}
}

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 71 is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

void SendMessage(uint32_t* buff, size_t size32)
{
for (int i = 0; i < size32; ++i)
{
SendWord(buff[i]);
}
}

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
msg->a = i;
msg->b = i+1;
SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

  • Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.

      union {
    Msg msg;
    unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
    };
  • You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))

  • You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

1 The types that C 2011 6.5 7 allows an lvalue to access are:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

std::launder and strict aliasing rule

The strict aliasing rule is a restriction on the type of the glvalue actually used to access an object. All that matters for the purpose of that rule are a) the actual type of the object, and b) the type of the glvalue used for the access.

The intermediate casts the pointer travels through are irrelevant, as long as they preserve the pointer value. (This goes both ways; no amount of clever casts - or laundering, for that matter - will cure a strict aliasing violation.)

f is valid as long as ptr actually points to an object of type int, assuming that it accesses that object via int_ptr without further casting.

example_1 is valid as written; the reinterpret_casts do not change the pointer value.

example_2 is invalid because it gives f a pointer that doesn't actually point to an int object (it points to the out-of-lifetime first element of the storage array). See Is there a (semantic) difference between the return value of placement new and the casted value of its operand?

What is the rationale behind the strict aliasing rule?

Since, in this example, all the code is visible to a compiler, a compiler can, hypothetically, determine what is requested and generate the desired assembly code. However, demonstration of one situation in which a strict aliasing rule is not theoretically needed does nothing to prove there are not other situations where it is needed.

Consider if the code instead contains:

foo(&val, ptr)

where the declaration of foo is void foo(uint64_t *a, uint32_t *b);. Then, inside foo, which may be in another translation unit, the compiler would have no way of knowing that a and b point to (parts of) the same object.

Then there are two choices: One, the language may permit aliasing, in which case the compiler, while translating foo, cannot make optimizations relying on the fact that *a and *b are different. For example, whenever something is written to *b, the compiler must generate assembly code to reload *a, since it may have changed. Optimizations such as keeping a copy of *a in registers while working with it would not be allowed.

The second choice, two, is to prohibit aliasing (specifically, not to define the behavior if a program does it). In this case, the compiler can make optimizations relying on the fact that *a and *b are different.

The C committee chose option two because it offers better performance while not unduly restricting programmers.

Strict aliasing rule

Yeah, it's invalid, but not because you're converting a char* to an A*: it's because you are not obtaining a A* that actually points to an A* and, as you've identified, none of the type aliasing options fit.

You'd need something like this:

#include <new>
#include <iostream>

struct A
{
int t;
};

char *buf = new char[sizeof(A)];

A* ptr = new (buf) A;
ptr->t = 1;

// Also valid, because points to an actual constructed A!
A *ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;

Now type aliasing doesn't come into it at all (though keep reading because there's more to do!).

  • (live demo with -Wstrict-aliasing=2)

In reality, this is not enough. We must also consider alignment. Though the above code may appear to work, to be fully safe and whatnot you will need to placement-new into a properly-aligned region of storage, rather than just a casual block of chars.

The standard library (since C++11) gives us std::aligned_storage to do this:

using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;

Or, if you don't need to dynamically allocate it, just:

Storage data;

Then, do your placement-new:

new (buf) A();
// or: new(&data) A();

And to use it:

auto ptr = reinterpret_cast<A*>(buf);
// or: auto ptr = reinterpret_cast<A*>(&data);

All in it looks like this:

#include <iostream>
#include <new>
#include <type_traits>

struct A
{
int t;
};

int main()
{
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;

auto* buf = new Storage;
A* ptr = new(buf) A();

ptr->t = 1;

// Also valid, because points to an actual constructed A!
A* ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
}

(live demo)

Even then, since C++17 this is somewhat more complicated; see the relevant cppreference pages for more information and pay attention to std::launder.

Of course, this whole thing appears contrived because you only want one A and therefore don't need array form; in fact, you'd just create a bog-standard A in the first place. But, assuming buf is actually larger in reality and you're creating an allocator or something similar, this makes some sense.

C++'s Strict Aliasing Rule - Is the 'char' aliasing exemption a 2-way street?

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.

Is this strict aliasing violation? Can any type pointer alias a char pointer?

Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.

The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.

(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.

*It does matter even in case of no const members, probably, not sure, relevant question . @eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.

For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.

That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.

Addresing your second example

char* buffer = (char*)malloc(16); //OK

// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);

// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;

Strict Aliasing Rules and Placement New

What is the proper way to comply with the strict aliasing rule when using placement new?

The correct way is to use std::aligned_storage. That code sample doesn't guarantee correct storage alignment for Fred, so it should not be used.

The correct way to do this is:

#include <new>         // For placement new
#include <type_traits> // For std::aligned_storage

struct Fred {
// ...
};

void someCode() {
std::aligned_storage<sizeof(Fred), alignof(Fred)>::type memory;
// Alternatively, you can remove the "alignof(Fred)" template parameter if you
// are okay with the default alignment, but note that doing so may result in
// greater alignment than necessary and end up wasting a few bytes.
Fred* f = new(&memory) Fred();
}

Wouldn't the above code violate C++'s strict aliasing rule since place and memory are different types, yet reference the same memory location?

Now, as for your concerns about aliasing between f, place, and memory in the original code, note that there isn't any aliasing violation. The strict aliasing rule means that you can't "dereference a pointer that aliases an incompatible type". Since you can't dereference a void* (and it's legal to convert a pointer to/from a void*), there's no risk of place causing a strict aliasing violation.

Strict Aliasing Rule and Type Aliasing in C++

No, it's not legal and you have Undefined Behavior:

8.2.1 Value category [basic.lval]

11 If a program attempts to access the stored value of an object
through a glvalue of other than one of the following types the
behavior is undefined: 63

(11.1) — the dynamic type of the object,

(11.2) — a cv-qualified version of the dynamic type of the object,

(11.3) — a type similar (as defined in 7.5) to the dynamic type of the
object,

(11.4) — a type that is the signed or unsigned type corresponding to
the dynamic type of the object,

(11.5) — a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of the object,

(11.6) — an aggregate or union type that includes one of the
aforementioned types among its elements or nonstatic data members
(including, recursively, an element or non-static data member of a
subaggregate or contained union),

(11.7) — a type that is a (possibly cv-qualified) base class type of
the dynamic type of the object,

(11.8) — a char, unsigned char, or std::byte type



63) The intent of this list is to specify those circumstances in which
an object may or may not be aliased.



Related Topics



Leave a reply



Submit