Has a Std::Byte Pointer the Same Aliasing Implications as Char*

Has a std::byte pointer the same aliasing implications as char*?

From the current Standard draft ([basic.types]/2):

For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes ([intro.memory]) making up the object can be
copied into an array of char, unsigned char, or std​::​byte
([cstddef.syn]).43 If the content of that array is copied back into
the object, the object shall subsequently hold its original value.

So yes, the same aliasing rules apply for the three types, just as cppreference sums up.

It also might be valuable to mention ([basic.lval]/8.8):

If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:

  • a char, unsigned char, or std​::​byte type.

Is this strict aliasing violation? Can any type pointer alias a char pointer?

Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.

The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.

(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.

*It does matter even in case of no const members, probably, not sure, relevant question . @eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.

For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.

That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.

Addresing your second example

char* buffer = (char*)malloc(16); //OK

// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);

// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;

Is there a legal way to convert a unsigned char pointer to std::byte pointer?

The strict aliasing rule never forbids any pointer conversions. It is about the type of an expression accessing an object.

std::byte may alias any other type, this is mentioned in the cppreference page you linked, as well as in the strict aliasing rule in the Standard of course (C++17 basic.lval/8.8). So it is fine to use reinterpret_cast<std::byte *> and then read or write the array of unsigned char.

If you use an expression of type uint32_t to read or write an array of unsigned char, that would violate the strict aliasing rule.

Aliasing accesses through a std::bit_cast()ed pointer

Converting the pointer value is irrelevant. What matters is the object. You have a pointer to an object of type X, but the pointer's type is Y. Trying to access the object of type X through a pointer/reference to unrelated type Y is where the UB comes from.

How you obtained those pointers is mostly irrelevant. So bit_cast is no better than reinterpret_cast in this regard.

If there is no sockaddr_in there, then you can't pretend that there is one. However, it's possible that implicit object creation in C++20 already solves this matter, depending on your code. If it does, then it still doesn't matter how you get the pointer.

Does the aliasing loophole apply to signed characters?

No, the provision does not extend to signed char.

[basic.lval]

8 If a program attempts to access the stored value of an object
through a glvalue of other than one of the following types the
behavior is undefined:

  • [...]
  • a char, unsigned char, or std​::​byte type.

The quote above contains the very last bullet that pertains to aliasing with character types. signed char is excluded.

Nevertheless, this is also part of the subject CWG Issue 350 deals with, and so may change. Given the direction the issue has taken, the intent is for it to be (eventually, hopefully?) well-defined.

With strict aliasing in C++11, is it defined to _write_ to a char*, then _read_ from an aliased nonchar*?

is it defined to _write_ to a char*, then _read_ from an aliased nonchar*?

Yes.

Must the printed value reflect the change in [alias-write]?

Yes.

Strict aliasing says ((un)signed) char* can alias anything. The word "access" means both read and write operations.

C++'s Strict Aliasing Rule - Is the 'char' aliasing exemption a 2-way street?

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Some of your code is questionable due to the pointer conversions involved. Keep in mind that in those instances reinterpret_cast<T*>(e) has the semantics of static_cast<T*>(static_cast<void*>(e)) because the types that are involved are standard-layout. (I would in fact recommend that you always use static_cast via cv void* when dealing with storage.)

A close reading of the Standard suggests that during a pointer conversion to or from T* it is assumed that there really is an actual object T* involved -- which is hard to fulfill in some of your snippet, even when 'cheating' thanks to the triviality of types involved (more on this later). That would be besides the point however because...

Aliasing is not about pointer conversions. This is the C++11 text that outlines the rules that are commonly referred to as 'strict aliasing' rules, from 3.10 Lvalues and rvalues [basic.lval]:

10 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

(This is paragraph 15 of the same clause and subclause in C++03, with some minor changes in the text with e.g. 'lvalue' being used instead of 'glvalue' since the latter is a C++11 notion.)

In the light of those rules, let's assume that an implementation provides us with magic_cast<T*>(p) which 'somehow' converts a pointer to another pointer type. Normally this would be reinterpret_cast, which yields unspecified results in some cases, but as I've explained before this is not so for pointers to standard-layout types. Then it's plainly true that all of your snippets are correct (substituting reinterpret_cast with magic_cast), because no glvalues are involved whatsoever with the results of magic_cast.

Here is a snippet that appears to incorrectly use magic_cast, but which I will argue is correct:

// assume constexpr max
constexpr auto alignment = max(alignof(int), alignof(short));
alignas(alignment) char c[sizeof(int)];
// I'm assuming here that the OP really meant to use &c and not c
// this is, however, inconsequential
auto p = magic_cast<int*>(&c);
*p = 42;
*magic_cast<short*>(p) = 42;

To justify my reasoning, assume this superficially different snippet:

// alignment same as before
alignas(alignment) char c[sizeof(int)];

auto p = magic_cast<int*>(&c);
// end lifetime of c
c.~decltype(c)();
// reuse storage to construct new int object
new (&c) int;

*p = 42;

auto q = magic_cast<short*>(p);
// end lifetime of int object
p->~decltype(0)();
// reuse storage again
new (p) short;

*q = 42;

This snippet is carefully constructed. In particular, in new (&c) int; I'm allowed to use &c even though c was destroyed due to the rules laid out in paragraph 5 of 3.8 Object lifetime [basic.life]. Paragraph 6 of same gives very similar rules to references to storage, and paragraph 7 explains what happens to variables, pointers and references that used to refer to an object once its storage is reused -- I will refer collectively to those as 3.8/5-7.

In this instance &c is (implicitly) converted to void*, which is one of the correct use of a pointer to storage that has not been yet reused. Similarly p is obtained from &c before the new int is constructed. Its definition could perhaps be moved to after the destruction of c, depending on how deep the implementation magic is, but certainly not after the int construction: paragraph 7 would apply and this is not one of the allowed situations. The construction of the short object also relies on p becoming a pointer to storage.

Now, because int and short are trivial types, I don't have to use the explicit calls to destructors. I don't need the explicit calls to the constructors, either (that is to say, the calls to the usual, Standard placement new declared in <new>). From 3.8 Object lifetime [basic.life]:

1 [...] The lifetime of an object of type T begins when:

  • storage with the proper alignment and size for type T is obtained, and
  • if the object has non-trivial initialization, its initialization is complete.

The lifetime of an object of type T ends when:

  • if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
  • the storage which the object occupies is reused or released.

This means that I can rewrite the code such that, after folding the intermediate variable q, I end up with the original snippet.

Do note that p cannot be folded away. That is to say, the following is defintively incorrect:

alignas(alignment) char c[sizeof(int)];
*magic_cast<int*>(&c) = 42;
*magic_cast<short*>(&c) = 42;

If we assume that an int object is (trivially) constructed with the second line, then that must mean &c becomes a pointer to storage that has been reused. Thus the third line is incorrect -- although due to 3.8/5-7 and not due to aliasing rules strictly speaking.

If we don't assume that, then the second line is a violation of aliasing rules: we're reading what is actually a char c[sizeof(int)] object through a glvalue of type int, which is not one of the allowed exception. By comparison, *magic_cast<unsigned char>(&c) = 42; would be fine (we would assume a short object is trivially constructed on the third line).

Just like Alf, I would also recommend that you explicitly make use of the Standard placement new when using storage. Skipping destruction for trivial types is fine, but when encountering *some_magic_pointer = foo; you're very much likely facing either a violation of 3.8/5-7 (no matter how magically that pointer was obtained) or of the aliasing rules. This means storing the result of the new expression, too, since you most likely can't reuse the magic pointer once your object is constructed -- due to 3.8/5-7 again.

Reading the bytes of an object (this means using char or unsigned char) is fine however, and you don't even to use reinterpret_cast or anything magic at all. static_cast via cv void* is arguably fine for the job (although I do feel like the Standard could use some better wording there).

Object access through an unsigned char alias, what happens on load and on store?

accessing is ok, but what can be asserted about the value? And for
exemple I make a store through the object name, then a store through
the aliasing pointer and then a load through the object name, is there
no risk the compiler optimize away the last load and reflect it by an
immediate which would equal the first store?

Access is defined in [defns.access] to mean:

read or modify the value of an object

So modifying the value via *alias_to_array_s='Y'; is just as acceptable as reading it.

The compiler is allowed to optimize load/stores via the as-if rule. Your program doesn't have any observable behavior. If the assert passes, the compiler is free to replace g() with an empty body and not call it at all. If you are really worried about the compiler reordering the load/stores, you should be using volatile or look into memory barriers.



Related Topics



Leave a reply



Submit