Does Accessing the First Field of a Struct via a C Cast Violate Strict Aliasing

Does accessing the first field of a struct via a C cast violate strict aliasing?

First, it is legal to cast in C. §6.7.2.1/13:

Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.

The aliasing rule reads as follows (§6.5/7):

An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the
object,

a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or

a character type.

Here you would be accessing it via pointers of a "type compatible with the effective type of the object" and "an aggregate or union type that includes one of the aforementioned types among its members", so no problem with aliasing either. So in C, it is indeed perfectly legal to access the first member of a structure by casting the pointer to the structure to the type of the member in question.

In C++, however, you'll often find vtables and other things at the start of a C++ object. In your specific case, however, your structure is of standard layout, and so this is explicitly allowed (§9.2/20 in n3290, thanks Luc Danton! - C++03 apparently has a similar rule, expressed in terms of POD objects).

How does struct inheritance not violate the strict aliasing rule?

A. It seems to me this technique breaks the strict aliasing rule. Am I wrong, and if so, why?

Yes, you are wrong. I'll consider two cases:

Case 1: The `C` is fully initialized

That would be this, for example:

C *c = malloc(sizeof(*c));
*c = (C){0};  // or equivalently, "*c = (C){{{0}}}" to satisfy overzealous compilers

In that case, all the bytes of the representation of a C are set, and the effective type of the object comprising those bytes is C. This comes from paragraph 6.5/6 of the standard:

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of
the lvalue becomes the effective type of the object for that access
and for subsequent accesses that do not modify the stored value.

But structure and array types are aggregate types, which means that objects of such types contain other objects within them. In particular, each C contains a B identified as its member base. Because the allocated object is, at this point, effectively a C, it contains a sub-object that is effectively a B. One syntax for an lvalue referring to that B is c->base. The type of that expression is B, so it is consistent with the strict-aliasing rule to use it to access the B to which it refers. That has to be ok, else structures (and arrays) would not work at all, whether dynamically allocated or not.^*

But, as discussed in my answer to your previous question, (B *)c is guaranteed to be equal (in value and type) to &c->base. Thus *(B *)c is another lvalue referring to the B that is the first member of *c. That the syntax of that expression is different from that of the previous lvalue we considered is of no account. It is an lvalue of type B, associated with an object of type B, so using it to access the object to which it refers is one of the cases allowed by the SAR.

None of this is any different from the statically and automatically allocated cases.

Case 2: The `C` is not fully initialized

That could be something like this:

C *c = malloc(sizeof(*c));
*(B *)c = (B){0};

We have thereby assigned to the initial B-sized portion of the allocated object via an lvalue of type B, so the effective type of that initial portion is B. The allocated space does not at this point contain an object of (effective) type C. We can access the B and its members, read or write, via any acceptably-typed lvalues referring to them, as discussed above. But we have a strict aliasing violation if we

attempt to read *c as a whole (e.g. C c2 = *c;);
attempt to read C members other than base (e.g. X x = c->another;); or
attempt to read the allocated object via an lvalue of most unrelated types (e.g. Unrelated_but_not_char u = *(Unrelated_but_not_char *) c;

The first two of those cases are of interest here, and they make sense in terms of the dynamically allocated object, when interpreted as a C, not being fully initialized. Similar incomplete-initialization cases can arise with automatically allocated objects, too; they also produce undefined behavior, but by different rules.

Note well, however, that there is no strict aliasing violation for any write to the allocated space, because any such write will (re)assign the effective type of (at least) the region that is written to.

And that brings us to the main tricksome bit. What if we do this:

C *c = malloc(sizeof(*c));
c->base = (B){0};

? Or this:

C *c = malloc(sizeof(*c));
c->another = 0;

The allocated object does not have any effective type before the first write to it (and in particular, it does not have effective type C), so do write-to-member expressions via *c even make sense? Are they well-defined? The letter of the standard might support an argument that they do not, but no implementation adopts such interpretation, and there is no reason to think that any ever would.

The interpretation most consistent with both the letter of the standard and universal practice is that writing through a member-access lvalue constitutes simultaneously writing to the member and to its host aggregate, thus setting the effective type of the whole region, even though only one member's value is written. Of course, that still does not make it ok to read members whose values have not been written -- because their values are indeterminate, not because of the SAR.

That leaves this case:

C *c = malloc(sizeof(*c));
*(B *)c = (B){0};
B b2 = c->base;            // What about this?

That is, if the effective type of an initial region of the allocated space is B, can we use a member-access lvalue based on type C to read the stored value of that B region? Again, one might argue not, on the basis that there is no actual C, but in practice, no implementation makes that interpretation. The effective type of the object being read -- the initial region of the allocated space -- is the same as the type of the lvalue used for access, so in that sense there is no SAR violation. That the host C is wholly hypothetical is a question primarily of syntax, not semantics, because the same region can definitely be read as an object of the same type via an alternative expression.

^* But the SAR nevertheless forestalls any debate on this point by providing that "an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union)" is among the types that may be accessed. This clears any ambiguity surrounding the position that accessing a member also constitutes accessing any objects containing it.

Is void** an exception to strict aliasing rules?

Basically, is this code legal when strict aliasing is enabled?

No. The effective type of pi is int* but you lvalue access the pointer variable through a void*. De-referencing a pointer to give an access which doesn't correspond to the effective type of the object is a strict aliasing violation - with some exceptions, this isn't one.

In your second example, both parameters to the function are set to point at an object of effective type int* which is done here: f(&a, (char **) &a);. Therefore *b inside the function is indeed a strict aliasing violation, since you are using a char* type for the access.

In your third example you do the same but with a void*. This is also a strict aliasing violation. There is nothing special with void* or void** in this context.

Why your compilers exhibits a certain form of undefined behavior in some situations is not very meaningful to speculate about. Although void* must by definition be convertible to/from any other object pointer type, so they very likely have the representation internally, even though that's not an explicit requirement from the standard.

Also you are using -fno-strict-aliasing which turns off various pointer aliasing-based optimizations. If you wish to provoke strange and unexpected results, you shouldn't use that option.

What is the strict aliasing rule?

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

typedef struct Msg
{
    unsigned int a;
    unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
    // Get a 32-bit buffer from the system
    uint32_t* buff = malloc(sizeof(Msg));
    
    // Alias that buffer through message
    Msg* msg = (Msg*)(buff);
    
    // Send a bunch of messages    
    for (int i = 0; i < 10; ++i)
    {
        msg->a = i;
        msg->b = i+1;
        SendWord(buff[0]);
        SendWord(buff[1]);   
    }
}

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 7¹ is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

void SendMessage(uint32_t* buff, size_t size32)
{
    for (int i = 0; i < size32; ++i) 
    {
        SendWord(buff[i]);
    }
}

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
    msg->a = i;
    msg->b = i+1;
    SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.
```
  union {
      Msg msg;
      unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
  };
```
You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))
You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

¹ The types that C 2011 6.5 7 allows an lvalue to access are:

a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.

Best practices for object oriented patterns with strict aliasing and strict alignment in C

You can avoid the compiler warning by casting to void * first:

struct Derived1 *dp = (struct Derived1 *) (void *) bp;

(After the cast to void *, the conversion to struct Derived1 * is automatic in the above declaration, so you could remove the cast.)

The methods of using a pointer-to-a-pointer or a union to reinterpret a pointer are not correct; they violate the aliasing rule, as a struct Derived1 * and a struct Base * are not suitable types for aliasing each other. Do not use those methods.

(Due to C 2018 6.2.6.1 28, which says “… All pointers to structure types shall have the same representation and alignment requirements as each other…,” an argument can be made that reinterpreting one pointer-to-a-structure as another through a union is supported by the C standard. Footnote 49 says “The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.” At best, however, this is a kludge in the C standard and should be avoided when possible.)

Why does using a union type and/or pointer-to-pointer violate strict aliasing rules? In other words what is fundamentally different between what is done in main (taking address of b member) and what is done in print_val() other than the direction of the conversion? Both yield the same situation - two pointers that point to the same memory, which are different struct types - a struct Base* and a struct Derived1*.
It would seem to me that if this were violating strict aliasing rules in any way, the introduction of an intermediate void* cast would not change the fundamental problem.

The strict aliasing violation occurs in aliasing the pointer, not in aliasing the structure.

If you have a struct Derived1 *dp or a struct Base *bp and you use it to access a place in memory where there actually is a struct Derived1 or, respectively, a struct Base, then there is no aliasing violation because you are accessing an object through an lvalue of its type, which is allowed by the aliasing rule.

However, this question suggested aliasing a pointer. In *((struct Derived1 **)&bp);, &bp is the location where there is a struct Base *. This address of a struct Base * is converted to the address of a struct Derived1 **, and then * forms an lvalue of type struct Derived1 *. The expression is then used to access a struct Base * using a type of struct Derived1 *. There is no match for that in the aliasing rule; none of the types it lists for accessing a struct Base * are a struct Derived1 *.

Comparing struct pointers, casting away members, and UB

Section 6.2.5 Types paragraph 28 of the C standard says:

[...] All pointers to structure types shall have the same representation and alignment requirements as each other. [...]

Section 6.3.2.3 Pointers paragraph 1 says:

A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

And paragraph 7 says:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned⁶⁸⁾ for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. [...]

And footnote 68 says:

In general, the concept "correctly aligned" is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

Because all pointers to structure types have the same representation, the conversions between pointers to void and pointers to structure types must be the same for all pointers to structure types. So it seems that a pointer to structure type A could be converted by a cast operator directly to a pointer to structure type B without an intermediate conversion to a pointer to void as long as the pointer is "correctly aligned" for structure type B. (This may be a weak argument.)

The question remains when, in the case of two structure types A and B where the initial sequence of structure type A consists of all the members of structure type B, a pointer to structure type A is guaranteed to be correctly aligned for structure type B (the reverse is obviously not guaranteed). As far as I can tell, the C standard makes no such guarantee. So strictly speaking, a pointer to the larger structure type A might not be correctly aligned for the smaller structure type B, and if it is not, the behavior is undefined. For a "sane" compiler, the larger structure type A would not have weaker alignment than the smaller structure type B, but for an "insane" compiler, that might not be the case.

Regarding the second question about accessing members of the truncated (shorter) structure using the pointer derived from the full (longer) structure, then as long as the pointer is correctly aligned for the shorter structure (see above for why that might not be true for an "insane" compiler), and as long as strict aliasing rules are avoided (for example, by going through an intermediate pointer to void in an intermediate external function call across compilation unit boundaries), then accessing the members through the pointer to the shorter structure type should be perfectly fine. There is a special guarantee for that when objects of both structure types appear as members of the same union type. Section 6.3.2.3 Structure and union members paragraph 6 says:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union
object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union
is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

However, since the offsets of members within a structure type does not depend on whether an object of the structure type appears in a union type or not, the above implies that any structures with a common initial sequence of members will have those common members at the same offsets within their respective structure types.

Does reinterpret_casting std::aligned_storage* to T* without std::launder violate strict-aliasing rules?

I asked a related question in the ISO C++ Standard - Discussion forum. I learned the answer from those discussions, and write it here to hope to help someone else who is confused about this question. I will keep updating this answer according to those discussions.

Before P0137, refer to [basic.compound] paragraph 3:

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

and [expr.static.cast] paragraph 13:

If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A.

The expression reinterpret_cast<const T*>(data+pos) represents the address of the previously created object of type T, thus points to that object. Indirection through this pointer indeed get that object, which is well-defined.

However after P0137, the definition for a pointer value is changed and the first block-quoted words is deleted. Now refer to [basic.compound] paragraph 3:

Every value of pointer type is one of the following:

a pointer to an object or function (the pointer is said to point to the object or function), or

...

and [expr.static.cast] paragraph 13:

If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

The expression reinterpret_cast<const T*>(data+pos) still points to the object of type std::aligned_storage<...>::type, and indirection get a lvalue referring to that object, though the type of the lvalue is const T. Evaluation of the expression v1[0] in the example tries to access the value of the std::aligned_storage<...>::type object through the lvalue, which is undefined behavior according to [basic.lval] paragraph 11 (i.e. the strict-aliasing rules):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

the dynamic type of the object,

a cv-qualified version of the dynamic type of the object,

a type similar (as defined in [conv.qual]) to the dynamic type of the object,

a type that is the signed or unsigned type corresponding to the dynamic type of the object,

a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

a char, unsigned char, or std::byte type.

Does Accessing the First Field of a Struct via a C Cast Violate Strict Aliasing