Does Reinterpret_Cast Lead to Undefined Behavior

Does reinterpret_cast lead to undefined behavior?

As A<double> and A<const double> are unrelated types, it's actually unspecified (originally I thought undefined) behavior and correspondingly yes it's a bad idea to use in real life: You never know what system(s) or compiler(s) you may port to that change the behavior is strange ways.

Reference:

5.2.10/11:

An lvalue expression of type T1 can be cast to the type “reference to
T2” if an expression of type “pointer to T1” can be explicitly
converted to the type “pointer to T2” using a reinterpret_cast. That
is, a reference cast reinterpret_cast(x) has the same effect as
the conversion *reinterpret_cast(&x) with the built-in & and *
operators (and similarly for reinterpret_cast(x)).

So they've redirected us to an earlier section 5.2.10/7:

An object pointer can be explicitly converted to an object pointer of
a different type. ... ... Converting a prvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value. The result of any other such pointer
conversion is unspecified.

If f and g are algorithms that work on containers, the easy solution is to change them to template algorithms that work on ranges (iterator pairs).

Is it undefined behavior to `reinterpret_cast` a `T*` to `T(*)[N]`?

An array object and its first element are not pointer-interconvertible*, so the result of the reinterpret_cast is a pointer of type "pointer to array of 8 int" whose value is "pointer to a[0]"1.In other words, despite the type, it does not actually point to any array object.

The code then applies the array-to-pointer conversion to the lvalue that resulted from dereferencing such a pointer (as a part of the indexing expression (*p)[0])2. That conversion's behavior is only specified when the lvalue actually refers to an array object3. Since the lvalue in this case does not, the behavior is undefined by omission4.


*If the question is "why is an array object and its first element not pointer-interconvertible?", it has already been asked: Pointer interconvertibility vs having the same address.

1See [expr.reinterpret.cast]/7, [conv.ptr]/2, [expr.static.cast]/13 and [basic.compound]/4.

2See [basic.lval]/6, [expr.sub] and [expr.add].

3[conv.array]: "The result is a pointer to the first element of the array."

4[defns.undefined]: undefined behavior is "behavior for which this document imposes no requirements", including "when this document omits any explicit definition of behavior".

Does the following reinterpret_cast lead to undefined behavior?

The cast itself doesn't have UB (see [expr.reinterpret.cast]), but accessing the referred pointer (rpb) through the reinterpreted reference (rpd) does:

[basic.lval] (standard draft)

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:56

56) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

  • (8.1)
    the dynamic type of the object,

Does not apply, the dynamic type is the static type which is Base*, not Derived* that is the type of the glvalue.

  • (8.2)
    a cv-qualified version of the dynamic type of the object,

There are no cv-qualifications, and the types still don't match.

  • (8.3)
    a type similar to the dynamic type of the object,

Does not apply. This is about cv-qualifier de-compositions, see [conv.qual] (sorry, those many subscripts in the paragraphs are a pain to type in html and are necessary to keep the text readable).

  • (8.4)
    a type that is the signed or unsigned type corresponding to the dynamic type of the object,

Only relevant to integral types.

  • (8.5)
    a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

Ditto.

  • (8.6)
    an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

Derived* is neither an aggregate nor a union.

  • (8.7)
    a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

Derived* is not a base of Base*.

  • (8.8)
    a char, unsigned char, or std​::​byte type.

Derived* is none of those.


As none of the exceptions apply, the behaviour of accessing a Base* through a glvalue of Derived* type is undefined.


I am experimenting with an adapter class that should allow vectors containing covariant pointer types to be used as covariant types themselves.

Your experiment would fail to uphold basic object oriented principles.

Base references are covariant with derived, because you cannot do anything with a derived object through a base reference that you couldn't do with the derived object itself.

Containers of base type cannot be covariant with containers of derived, because you can do something with a container of (derived through a "referencing" container of) base that you couldn't do with a container of derived: Add objects of other derived types.

Although, if the containers are immutable... it might work conceptually. Actually implementing it in C++ is another matter.

Does reassignment of pointers acquired by reinterpret_cast from raw memory cause UB?

First, for your second line

Node* dummy = reinterpret_cast<Node*>(new int8_t[sizeof(Node)]);

by itself.

new returns a pointer to the first int8_t object in the array of int8_t objects it created.

reinterpret_cast's behavior depends on the alignment of the address represented by the pointer. If it is suitably aligned for an object of type Node, then it will leave the pointer value unchanged (since there is definitively no Node object at the location which is pointer-interconvertible with the int8_t object). If it is not suitably aligned, the returned pointer value will be unspecified.

Unspecified means that we won't know what the value will be, but it wont cause undefined behavior.

Therefore, in any case, the second line and the cast by itself do not have undefined behavior.


The line

dummy->prev = ... /* last node */;

requires that the object dummy points to is actually a Node object. Otherwise it has undefined behavior. As mentioned above, reinterpret_cast gives us either an unspecified value or a pointer to the int8_t object. This already is an issue, that I think at least requires a std::launder call.

Even if the pointer returned from new is correctly aligned, then we still need to check whether a Node object is present. We certainly did not create any such object in any of the shown operations explicitly, but there is implicit object creation which may help out (at least since C++20, but I suppose this was supposed to be a defect report against older standard versions).

Specifically, objects may be created implicitly inside an array of types unsigned char, std::byte and, with some limitations, char (CWG 2489) when the lifetime of the array is started. int8_t is usually signed char and I think is not allowed to be either of the three previously mentioned types (see e.g. this question). This removes the only possible way out of UB.

So your third code line does have undefined behavior.


Even if you remedy this by changing the type form int8_t to std::byte, there are other constraints on the details of Node to make the implicit object creation possible. It may also be necessary to add a std::launder call.

All of this doesn't consider the alignment yet, because although new[] obtains memory with some alignment requirements, I think the standard mandates new[] itself to return a pointer with stronger alignment than required for the element type only for char, unsigned char and std::byte array new.

Many of these issues can probably be avoided by using e.g. operator new directly, possibly with provided alignment request, and making sure that Node is an aggregate.

In any case writing code like this is very risky because it is difficult to be sure that it isn't UB. It should be avoided when ever possible.

Is `reinterpret_cast` actually good for anything?

There are two situations in which I’ve used reinterpret_cast:

  1. To cast to and from char* for the purpose of serialisation or when talking to a legacy API. In this case the cast from char* to an object pointer is still strictly speaking UB (even though done very frequently). And you don’t actually need reinterpret_cast here — you could use memcpy instead, but the cast might under specific circumstances avoid a copy (but in situations where reinterpreting the bytes is valid in the first place, memcpy usually doesn’t generate any redundant copies either, the compiler is smart enough for that).

  2. To cast pointers from/to std::uintptr_t to serialise them across a legacy API or to perform some non-pointer arithmetic on them. This is definitely an odd beast and doesn’t happen frequently (even in low-level code) but consider the situation where one wants to exploit the fact that pointers on a given platform don’t use the most significant bits, and these bits can thus be used to store some bit flags. Garbage collector implementations occasionally do this. The lower bits of a pointer can sometimes also be used, if the programmer knows that the pointer will always be aligned e.g. at an 8 byte boundary (so the lowest three bits must be 0).

But to be honest I can’t remember the last concrete, legitimate situation where I’ve actually used reinterpret_cast. It’s definitely many years ago.

Dealing with undefined behavior when using reinterpret_cast in a memory mapping

std::launder solves the problem with strict aliasing, but not with object lifetime.

std::bit_cast makes a copy (it's basically a wrapper for std::memcpy) and doesn't work with copying from a range of bytes.

There is no tool in standard C++ to reinterpret mapped memory without copying. Such tool has been proposed: std::bless. Until/unless such changes are adopted into the standard, you'll have to either hope that UB doesn't break anything, take the potential†† performance hit and copy, or write the program in C.

While not ideal, this is not necessarily as bad as it sounds. You're already restricting portability by using mmap, and if your target system / compiler promises that it is OK to reinterpret mmapped memory (perhaps with laundering), then there should be no problem. That said, I don't know if say, GCC on Linux gives such guarantee.

†† The compiler may optimise std::memcpy away. There might not be any performance hit involved. There's a handy function in this SO answer which was observed to be optimised away, but does initiate object lifetime following the language rules. It does have a limitation the mapped memory must be writable (as it creates objects in the memory, and in non-optimised build it might do an actual copy).

Does reinterpret_cast cause a strict aliasing violation?

Does reinterpret_cast cause a strict aliasing violation?

Not by itself, no. But its misuse can lead to such violation.

It is not a good idea to use reinterpret cast unless you know that you need it (rare), know that there is no satisfactory alternative (rarer), and know that it won't lead to undefined behaviour.

why does it even exist?

As the name implies, to allow reinterpretation of types. The use cases are rare in C++ and not for beginner, nor intermediate programmers.

Some cases where an advanced programmer might encounter it useful:

  • Serialisation
  • C interfaces that use reinterpretation as a form of polymorphism.

In what situations does it not cause a strict aliasing violation, and in which ones does it do cause one?

The cast itself never causes any violations.

Strict aliasing violations can only occur when you have casted to a pointer (or reference) of another type, and then indirect through that pointer and access the object. So, if you don't reinterpret cast a pointer (or reference), or you don't access the pointed object, then you aren't aliasing the type of the object, and therefore cannot violate the strict aliasing rules.

So, what is interesting is whether accessing the object with another (aliased) type is well defined or not. Here is a list from cppreference:

  • AliasedType and DynamicType are similar.
  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  • AliasedType is std::byte (since C++17), char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

Missing from this list is:

  • The pointed object of DynamicType is pointer-interconvertible with another object of AliasedType.


if you cast between two incompatible types, (...), is it safe to use it

Depends on what you mean by "use". If you mean indirect through the reinterpreted pointer and access the object through the "incompatible" type, then no that is not safe in general.

and the result of that cast is the only pointer that points to the memory it uses in the whole program

This is irrelevant. It is in most cases practically impossible for a compiler to prove that this is true.

Is it safe then to cast it back to its original type and also use it assuming the variable where it is stored

Assuming the cast to the other type was well formed in the first place, then converting to back to the original type is always safe.



Related Topics



Leave a reply



Submit