C++ unions vs. reinterpret_cast
The reason it's undefined is because there's no guarantee what exactly the value representations of int
and float
are. The C++ standard doesn't say that a float
is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int
object with value 0xffff
as a float
? It doesn't say anything other than the fact it is undefined.
Practically, however, this is the purpose of reinterpret_cast
- to tell the compiler to ignore everything it knows about the types of objects and trust you that this int
is actually a float
. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.
This is true for both the union
and reinterpret_cast
approaches. I suggest that reinterpret_cast
is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.
What is the difference between a proper defined union and a reinterpret_cast?
Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.
From the standard point of view, reinterpret_cast
is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.
At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).
Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.
This is particularly important with high optimizations when reading from network:
double ntohdouble(const char *buffer) { // [1]
union {
int64_t i;
double f;
} data;
memcpy(&data.i, buffer, sizeof(int64_t));
data.i = ntohll(data.i);
return data.f;
}
double ntohdouble(const char *buffer) { // [2]
int64_t data;
double dbl;
memcpy(&data, buffer, sizeof(int64_t));
data = ntohll(data);
dbl = *reinterpret_cast<double*>(&data);
return dbl;
}
The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl
variable before evaluating ntohl, thus producing the wrong results.
(*) With the exception that you are always allowed to read from a [signed|unsigned] char*
regardless of that the real object (original pointer type) was.
(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.
Is `reinterpret_cast` on `this` inside union-like class an undefined behavior?
This is undefined behavior. As an overview the union contains either a uint32_t
or a B
.
- If it's a
B
then the cast is illegal (since it's not auint32_t
you mustn't cast to it). - If it is a
uint32_t
then calling the.c()
member is illegal since you can't access theb1
member (isn't the active union member).
In this case (thanks to @StoryTeller's comment) the active union member is a
(the uint32_t
) since it's the only one with default initialization, thus calling a.b1.c()
is UB.
Is using an union in place of a cast well defined?
Your code is not portable. It might work on some compilers or it might not.
You are right about the behaviour being undefined when you try to access the inactive member of the union [as it is in the case of the code given]
$9.5/1
In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.
So foo.c[0] == 1
is incorrect because c
is not active at that moment. Feel free to correct me if you think I am wrong.
Union vs. static_cast(void*)
Consider boost::any
or boost::variant
if you want to store objects of heterogeneous types.
And before deciding which one to use, have a look at the comparison:
- Boost.Variant vs. Boost.Any
Hopefully, it will help you to make the correct decision. Choose one, and any of the container from the standard library to store the objects, std::vector<boost::any>
, std::vector<boost::variant>
, or any other.
How unions are used to avoid type coversion
It's called type punning and is explicitly allowed using unions in C but not in C++.
It works (in C) because all members of a union occupy the same memory, so there's really no conversion. You just interpret the bits of the memory differently.
Related Topics
Move the String Out of a Std::Ostringstream
Scheduling Task in Using C++ on Linux and Windows Machine
C++ Destruction of Temporary Object in an Expression
Std::Set Iterator Automatically Const
Is It a Strict Aliasing Violation to Alias a Struct as Its First Member
Could Not Load Spatialite Extension in Qsqlite ( Qt 5.9)
Virtual Functions Default Parameters
How Is Push_Back Implemented in Stl Vector
Boost Interprocess Mutexes and Checking for Abandonment
Tcp Winsock: Accept Multiple Connections/Clients
Cannot Open Windows.H in Microsoft Visual Studio
Register an Object Creator in Object Factory
Documenting Preprocessor Defines in Doxygen
Should Function Declarations Include Parameter Names