C++ Unions VS. Reinterpret_Cast

C++ unions vs. reinterpret_cast

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.

Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.

This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

What is the difference between a proper defined union and a reinterpret_cast?

Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.

From the standard point of view, reinterpret_cast is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.

At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).

Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.

This is particularly important with high optimizations when reading from network:

double ntohdouble(const char *buffer) {          // [1]
union {
int64_t i;
double f;
} data;
memcpy(&data.i, buffer, sizeof(int64_t));
data.i = ntohll(data.i);
return data.f;
}
double ntohdouble(const char *buffer) { // [2]
int64_t data;
double dbl;
memcpy(&data, buffer, sizeof(int64_t));
data = ntohll(data);
dbl = *reinterpret_cast<double*>(&data);
return dbl;
}

The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl variable before evaluating ntohl, thus producing the wrong results.


(*) With the exception that you are always allowed to read from a [signed|unsigned] char* regardless of that the real object (original pointer type) was.

(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.

Is `reinterpret_cast` on `this` inside union-like class an undefined behavior?

This is undefined behavior. As an overview the union contains either a uint32_t or a B.

  • If it's a B then the cast is illegal (since it's not a uint32_t you mustn't cast to it).
  • If it is a uint32_t then calling the .c() member is illegal since you can't access the b1 member (isn't the active union member).

In this case (thanks to @StoryTeller's comment) the active union member is a (the uint32_t) since it's the only one with default initialization, thus calling a.b1.c() is UB.

Is using an union in place of a cast well defined?

Your code is not portable. It might work on some compilers or it might not.

You are right about the behaviour being undefined when you try to access the inactive member of the union [as it is in the case of the code given]

$9.5/1

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.

So foo.c[0] == 1 is incorrect because c is not active at that moment. Feel free to correct me if you think I am wrong.

Union vs. static_cast(void*)

Consider boost::any or boost::variant if you want to store objects of heterogeneous types.

And before deciding which one to use, have a look at the comparison:

  • Boost.Variant vs. Boost.Any

Hopefully, it will help you to make the correct decision. Choose one, and any of the container from the standard library to store the objects, std::vector<boost::any>, std::vector<boost::variant>, or any other.

How unions are used to avoid type coversion

It's called type punning and is explicitly allowed using unions in C but not in C++.

It works (in C) because all members of a union occupy the same memory, so there's really no conversion. You just interpret the bits of the memory differently.



Related Topics



Leave a reply



Submit