Safety of Casting Between Pointers of Two Identical Classes

Safety of casting between pointers of two identical classes?

C++11 added a concept called layout-compatible which applies here.

Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).

where

A standard-layout class is a class that:

  • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • has no virtual functions (10.3) and no virtual base classes (10.1),
  • has the same access control (Clause 11) for all non-static data members,
  • has no non-standard-layout base classes,
  • either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class.

A standard-layout union is a standard-layout class defined with the class-key union.

Finally

Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible
types shall have the same value representation and alignment requirements (3.11).

Which guarantees that reinterpret_cast can turn a pointer to one type into a pointer to any layout-compatible type.

Can you reinterpret_cast between types which have the same representation?

[expr.reinterpret.cast]/11:

A glvalue expression of type T1 can be cast to the type “reference
to T2” if an expression of type “pointer to T1” can be explicitly
converted to the type “pointer to T2” using a reinterpret_­cast.
The result refers to the same object as the source glvalue, but with
the specified type. [...]

Mary and Ashley are object types, so pointers thereto can convert to each other. Now, we get use a lvalue of type Ashley to access the underlying Mary object.

[basic.lval]/8:

If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:

  • the dynamic type of the object,

  • a cv-qualified version of the dynamic type of the object,

  • a type similar to the dynamic type of the object,

  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,

  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including,
    recursively, an element or non-static data member of a subaggregate or
    contained union),

  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

  • a char, unsigned char, or std​::​byte type.

None of these covers the case in question. ("Similar" talks about cv-qualification.) Therefore, undefined behavior.

About conversion between pointers

Some time ago I saw the code that controlled the first Apollo mission. It was all-assembly, the C, lat alone C++, did not exist at those days. We can imagine that in that assembly, all they had at their disposal was registers and raw memory that had to be managed manually.

In modern, non-assembly languages, including C/C++, the compiler frees you from the burden of manual memory management. The memory for named variables, like in

int x = 0;

is allocated and deallocated automatically. Also, you don't need to bother about memory that is needed temporarily, e.g., during some complex computations. In C/C++ these "usual" variables are accompanied by pointers that allow you to access this automatically or semi-automatically managed memory indirectly. Moreover, you need pointers to access the heap-allocated ("nameless") memory (remember operator new?). And this is almost all you should know about them. The syntax is known: the asterisk *, the "arrow" -> and the ampersand &. It's the compiler's duty to protect you from doing probably stupid operation on pointers, like using them as if they are just integers (a popular trick in 1970s) or using them as if they pointed to something else that you've declared. Declaring a pointer means that you declare the whole set of operations on it. To protect you from accidental shooting your foot off, the compiler forbids such horrible things like automatic assignments of pointers of different types.

However, you are the master of the compiler, not the opposite, so you can take the full control. This is where pointer conversion steps in. On the one hand, by ussing pointer conversion, you can temporarily switch off all the safeties the creators of the compiler has prepared for you. On the other hand, with a suitable cast, you can guarantee that your program is correct!

The really important question is: which operations involving pointer conversions are guaranteed to be safe?

The first, a bit surprising thing is: there are several types of pointers, which need not be mutually compatible:

  • pointer to data
  • pointer to (free) function
  • pointer to data member
  • pointer to member function
  • void* pointer

Their internal representation, or even the number of bits they occupy, need not be the same!

So, what is allowed?

The C language loves to use void* as the means of passing data of arbitrary type. So you can be sure that you can convert any data pointer to void* and then back to its original type and this will work. Similarly with pointers to functions.

Another place where pointer conversions are natural is low-level serialization. Many function in both C and C++ work on buffers declared as char* or const char*. From this you can infer that you should be able to convert any pointer to data to char* and it will work as expected.

The third common pointer conversion is when you deal with inheritance, especially in the context of classes with virtual functions. Here comes dynamic_cast to cast a pointer in the potentially "dangerous" way along the inheritance tree.

There are also other casts that are safe and others that may lead to undefined behavior or even segmentation fault. For example, pointers to int are often casts to or from pointers to float and it usually works, e.g., in AVX or GPU programming. Correctness of these particular casts are guaranteed by hardware vendors rather than the C++ standard.

What dangers are there if you leave the land of safe casts? For example, you cast a pointer to function to void* and then this pointer to char*. If you try to dereference this pointer, let alone write through it, your program is doomed to die immediately. Similarly dangerous conversions are between pointers to data and pointers to members. Don't do that. Also, don't use pointers to cast pointers to integers etc. This is generally the land of undefined behavior, unless you carefully consulted the standard and are sure the thing you're doing is safe.

To sum this long story up: pointer conversion usually generates no code, but sometimes it does. See the examples below.

EXAMPLES

1. Inheritance. Static cast vs. dynamic cast.

Suppose we have XY derived from both X and Y:

struct X
{
float x;
};

struct Y
{
float y;
};

struct XY : public X, public Y
{
float xy;
};

Then this code:

    XY a;
std::cout << static_cast<X*>(&a) << "\n";
std::cout << static_cast<Y*>(&a) << "\n";
std::cout << reinterpret_cast<X*>(&a) << "\n";
std::cout << reinterpret_cast<Y*>(&a) << "\n";
std::cout << &a << "\n";

may produce these results:

0x7fffc7d75dcc

0x7fffc7d75dd0

0x7fffc7d75dcc

0x7fffc7d75dcc

0x7fffc7d75dcc

So, static_cast does change the "value" of the pointer! At the same time, reinterpret_cast only "reinterprets" the bits stored in a pointer variable / register, but does not modify them.

Interestingly, if we defined a virtual method in Y, the same code would reveal a reversed order of X and Y inside XY:

0x7ffc6a45764c

0x7ffc6a457640

0x7ffc6a457640

0x7ffc6a457640

0x7ffc6a457640

So, the order of embedding the base classes in multiple inheritance seems to be undefined / implementation-dependent.

2. casts to integer types
The standard allows to cast pointers to and from integer types of sufficient length.

Casts of "C-style pointers" to integer types work as expected:

    XY a;
std::cout << std::hex << reinterpret_cast<uint64_t>(&a) << "\n";
std::cout << reinterpret_cast<void*>(&a) << "\n";

yields, for example,

7fff84f5a6cc

0x7fff84f5a6cc

as expected. But he standard does not guarantee this.

3. pointers to data members
This case is more interesting: pointers to members cannot be cast to anything. But there's a workaround:

    XY a;
float XY::* mptr_xy = &XY::xy;
float XY::* mptr_x = &XY::x;
float XY::* mptr_y = &XY::y;
std::cout << *reinterpret_cast<uint64_t*>(&mptr_x) << "\n";
std::cout << *reinterpret_cast<uint64_t*>(&mptr_y) << "\n";
std::cout << *reinterpret_cast<uint64_t*>(&mptr_xy) << "\n";

This may give the following result:

0

4

8

This shows that the pointers to members are offsets into objects memory rather than pointers to RAM. If we add a virtual destructor to Y we may get this:

12

8

16

The offsets have changed, making room for vptr and moving Y to the front. If both X and Y have virtual destructors, a possible result is this:

8

24

28

Conclusion: pointers to members are completely different beasts than standard pointers.

4. Dynamic casts

Suppose both X and Y have virtual destructors. This code:

    XY xy;
Y y;
std::cout << "xy:\n";
std::cout << &xy << "\n";
std::cout << dynamic_cast<Y*>(&xy) << "\n";
std::cout << dynamic_cast<Y*>(dynamic_cast<X*>(&xy)) << "\n";
std::cout << dynamic_cast<X*>(&xy) << "\n";
std::cout << "y:\n";
std::cout << &y << "\n";
std::cout << dynamic_cast<XY*>(&y) << "\n";
std::cout << dynamic_cast<Y*>(dynamic_cast<XY*>(&y)) << "\n";
std::cout << dynamic_cast<Y*>(reinterpret_cast<XY*>(&y)) << "\n";
std::cout << dynamic_cast<Y*>(static_cast<XY*>(&y)) << "\n";

may produce this output:

xy:

0x7ffc0a570590

0x7ffc0a5705a0

0x7ffc0a5705a0

0x7ffc0a570590

y:

0x7ffc0a570580

0

0

0x7ffc0a570590

0x7ffc0a570580

  • Casts of the address of xy may give 2 different results.
  • Casts of the address of y may give 3 different results.

Summary:

Pointer casts in C++ are far more complex than in C. The standard only guarantees behavior, but not the implementation. So, for example, a cast to an integer type might involve some bit manipulation - existing implementations don't do this because there's no point in doing this, not because it's forbidden. The main differences arise from C++ having multiple inheritance, virtual function and pointers to members.

Caveat. This long reply did not cover all aspects of pointer casts, most notably data alignment. Also, the standard does not impose any limitation on the implementation of various pointers, and even the reinterpret_cast is not guaranteed to preserve the internal bit representation:

Unlike static_cast, but like const_cast, the reinterpret_cast expression does not compile to any CPU instructions (except when converting between integers and pointers or on obscure architectures where pointer representation depends on its type).

https://en.cppreference.com/w/cpp/language/reinterpret_cast

See also:

  • https://en.cppreference.com/w/cpp/language/explicit_cast
  • https://en.cppreference.com/w/cpp/language/const_cast
  • https://en.cppreference.com/w/cpp/language/dynamic_cast
  • https://en.cppreference.com/w/cpp/language/static_cast
  • https://en.cppreference.com/w/cpp/language/implicit_conversion

Is there a safe way to cast void* to class pointer in C++

Yes, static_cast is correct here.

Assuming that the void* value is only copied inside the C library, static_cast will return the original pointer value pointing to the passed object if you cast it to a pointer of the same type as it was originally.

Under some conditions you may also cast to a different type, the rules are those of reinterpret_cast (which simply casts via two static_casts with void* intermediate).

If you cast it to any type not allowed under those rules, e.g. an unrelated class type, trying to access the object through the pointer will cause undefined behavior. There is no way of detecting this mistake. You need to take care of doing it correctly yourself.

Is this use of reinterpret_cast on differently-qualified struct members safe?

My question is: is this use of reinterpret_cast well-defined, or does
it invoke undefined behavior?

reinterpret_cast is the wrong approach here, you're simply violating strict aliasing. It is somewhat perplexing that reinterpret_cast and union diverge here, but the wording is very clear about this scenario.

You might be better off simply defining a union thusly:

union elem_t {
Element e{}; KeyValuePair p;
/* special member functions defined if necessary */
};

… and using that as your vector element type. Note that cv-qualification is ignored when determining layout-compability - [basic.types]/11:

Two types cv1 T1 and cv2 T2 are layout-compatible types if
T1 and T2 are the same type, […]

Hence Element and KeyValuePair do indeed share a common initial sequence, and accessing the corresponding members of p, provided e is alive, is well-defined.


Another approach: Define

struct KeyValuePair {
Key key;
mutable Value value;
};

struct Element : KeyValuePair {
int flags;
};

Now provide an iterator that simply wraps a const_iterator from the vector and upcasts the references/pointers to be exposed. key won't be modifiable, but value will be.

Valid use of reinterpret_cast?

In C++11, this is fully allowed if the two types are layout-compatible, which is true for structs that are identical and have standard layout. See this answer for more details.

You could also stick the two structs in the same union in previous versions of C++, which had some guarantees about being able to access identical data members (a "common initial sequence" of data members) in the same order for different structure types.

In this case, yes, the C-style cast is equivalent, but reinterpret_cast is probably more idiomatic.

Safety of invalid downcast using static_cast (or reinterpret_cast) for inheritance without added members

It's undefined behavior, regardless of whether there's virtual function or not. The standard says clearly,

If the prvalue of type "pointer to cv1 B" points to a B that is
actually a subobject of an object of type D, the resulting pointer
points to the enclosing object of type D. Otherwise, the result of the
cast is undefined.



Related Topics



Leave a reply



Submit