Strict Aliasing Rule and 'Char *' Pointers

Best practices for object oriented patterns with strict aliasing and strict alignment in C

You can avoid the compiler warning by casting to void * first:

struct Derived1 *dp = (struct Derived1 *) (void *) bp;

(After the cast to void *, the conversion to struct Derived1 * is automatic in the above declaration, so you could remove the cast.)

The methods of using a pointer-to-a-pointer or a union to reinterpret a pointer are not correct; they violate the aliasing rule, as a struct Derived1 * and a struct Base * are not suitable types for aliasing each other. Do not use those methods.

(Due to C 2018 6.2.6.1 28, which says “… All pointers to structure types shall have the same representation and alignment requirements as each other…,” an argument can be made that reinterpreting one pointer-to-a-structure as another through a union is supported by the C standard. Footnote 49 says “The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.” At best, however, this is a kludge in the C standard and should be avoided when possible.)

Why does using a union type and/or pointer-to-pointer violate strict aliasing rules? In other words what is fundamentally different between what is done in main (taking address of b member) and what is done in print_val() other than the direction of the conversion? Both yield the same situation - two pointers that point to the same memory, which are different struct types - a struct Base* and a struct Derived1*.

It would seem to me that if this were violating strict aliasing rules in any way, the introduction of an intermediate void* cast would not change the fundamental problem.

The strict aliasing violation occurs in aliasing the pointer, not in aliasing the structure.

If you have a struct Derived1 *dp or a struct Base *bp and you use it to access a place in memory where there actually is a struct Derived1 or, respectively, a struct Base, then there is no aliasing violation because you are accessing an object through an lvalue of its type, which is allowed by the aliasing rule.

However, this question suggested aliasing a pointer. In *((struct Derived1 **)&bp);, &bp is the location where there is a struct Base *. This address of a struct Base * is converted to the address of a struct Derived1 **, and then * forms an lvalue of type struct Derived1 *. The expression is then used to access a struct Base * using a type of struct Derived1 *. There is no match for that in the aliasing rule; none of the types it lists for accessing a struct Base * are a struct Derived1 *.

Is this strict aliasing violation? Can any type pointer alias a char pointer?

Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.

The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.

(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.

*It does matter even in case of no const members, probably, not sure, relevant question . @eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.

For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.

That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.

Addresing your second example

char* buffer = (char*)malloc(16); //OK

// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);

// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;

With strict aliasing in C++11, is it defined to _write_ to a char*, then _read_ from an aliased nonchar*?

is it defined to _write_ to a char*, then _read_ from an aliased nonchar*?

Yes.

Must the printed value reflect the change in [alias-write]?

Yes.

Strict aliasing says ((un)signed) char* can alias anything. The word "access" means both read and write operations.

C++'s Strict Aliasing Rule - Is the 'char' aliasing exemption a 2-way street?

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.

Initializing a struct pointer with char array

This code is incorrect in Standard C:

char buf[4096];
read(fd1, buf, 4096); // Assume error handling, omitted for brevity
struct read_format* rf = (struct read_format*) buf;
printf("%llu\n", rf->nr);

There are two issues -- and these are distinct issues which should not be conflated -- :

  • buf might not be correctly aligned for struct read_format. If it isn't, the behaviour is undefined.
  • Accessing rf->nr violates the strict aliasing rule and the behaviour is undefined. An object with declared type char cannot be read of written by an expression of type . unsigned long long. Note that the converse is not true.

Why does it appear to work? Well, "undefined" does not mean "must explode". It means the C Standard no longer specifies the program's behaviour. This sort of code is somewhat common in real code bases. The major compiler vendors -- for now -- include logic so that this code will behave as "expected", otherwise too many people would complain.

The "expected" behaviour is that accessing *rf should behave as if there exists a struct read_format object at the address, and the bytes of that object are the same as the bytes of buf . Similar to if the two were in a union.

The code could be made compliant with a union:

union
{
char buf[4096];
struct read_format rf;
} u;

read(fd1, u.buf, sizeof u.buf);
printf("%llu\n", u.rf->nr);

The strict aliasing rule is "disabled" for union members accessed by name; and this also addresses the alignment problem since the union will be aligned for all members.

It's up to you whether to be compliant, or trust that compilers will continue put practicality ahead of maximal optimization within the constraints permitted by the Standard.

Is casting strings from `wchar_t` to `char16_t` legal if encoding and width is the same?

You already know the answer to this: strictly speaking, no.

wchar_t is not char16_t. Neither derives from the other. Neither is similar to the other. Neither is a signed/unsigned version of the other. Neither is an aggregate containing the other.And neither of them is a bytewise type (char, etc).

So you cannot access a wchar_t through a pointer/reference to a char16_t.

If strict avoidance of strict aliasing is your goal, you're going to have to copy the data to a different object. That is valid, assuming they both have the same representation.



Related Topics



Leave a reply



Submit