Is Using an Union in Place of a Cast Well Defined

Is using an union in place of a cast well defined?

Your code is not portable. It might work on some compilers or it might not.

You are right about the behaviour being undefined when you try to access the inactive member of the union [as it is in the case of the code given]

$9.5/1

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.

So foo.c[0] == 1 is incorrect because c is not active at that moment. Feel free to correct me if you think I am wrong.

What is the difference between a proper defined union and a reinterpret_cast?

Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.

From the standard point of view, reinterpret_cast is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.

At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).

Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.

This is particularly important with high optimizations when reading from network:

double ntohdouble(const char *buffer) {          // [1]
union {
int64_t i;
double f;
} data;
memcpy(&data.i, buffer, sizeof(int64_t));
data.i = ntohll(data.i);
return data.f;
}
double ntohdouble(const char *buffer) { // [2]
int64_t data;
double dbl;
memcpy(&data, buffer, sizeof(int64_t));
data = ntohll(data);
dbl = *reinterpret_cast<double*>(&data);
return dbl;
}

The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl variable before evaluating ntohl, thus producing the wrong results.


(*) With the exception that you are always allowed to read from a [signed|unsigned] char* regardless of that the real object (original pointer type) was.

(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.

Using unions to simplify casts

(Edited) Both gcc and MSVC allow 'anonymous' structs/unions, which might solve your problem. For example:

union Pixel {
struct {unsigned char b,g,r,a;};
uint32_t bits; // use 'unsigned' for MSVC
}

foo.b = 1;
foo.g = 2;
foo.r = 3;
foo.a = 4;
printf ("%08x\n", foo.bits);

gives (on Intel):

04030201

This requires changing all your declarations of struct Pixel to union Pixel in your original code. But this defect can be fixed via:

struct Pixel {
union {
struct {unsigned char b,g,r,a;};
uint32_t bits;
};
} foo;

foo.b = 1;
foo.g = 2;
foo.r = 3;
foo.a = 4;
printf ("%08x\n", foo.bits);

This also works with VC9, with 'warning C4201: nonstandard extension used : nameless struct/union'. Microsoft uses this trick, for example, in:

typedef union {
struct {
DWORD LowPart;
LONG HighPart;
}; // <-- nameless member!
struct {
DWORD LowPart;
LONG HighPart;
} u;
LONGLONG QuadPart;
} LARGE_INTEGER;

but they 'cheat' by suppressing the unwanted warning.

While the above examples are ok, if you use this technique too often, you'll quickly end up with unmaintainable code. Five suggestions to make things clearer:

(1) Change the name bits to something uglier like union_bits, to clearly indicate something out-of-the-ordinary.

(2) Go back to the ugly cast the OP rejected, but hide its ugliness in a macro or in an inline function, as in:

#define BITS(x) (*(uint32_t*)&(x))

But this would break the strict aliasing rules. (See, for example, AndreyT's answer: C99 strict aliasing rules in C++ (GCC).)

(3) Keep the original definiton of Pixel, but do a better cast:

struct Pixel {unsigned char b,g,r,a;} foo;
// ...
printf("%08x\n", ((union {struct Pixel dummy; uint32_t bits;})foo).bits);

(4) But that is even uglier. You can fix this by a typedef:

struct Pixel {unsigned char b,g,r,a;} foo;
typedef union {struct Pixel dummy; uint32_t bits;} CastPixelToBits;
// ...
printf("%08x\n", ((CastPixelToBits)foo).bits); // not VC9

With VC9, or with gcc using -pedantic, you'll need (don't use this with gcc--see note at end):

printf("%08x\n", ((CastPixelToBits*)&foo)->bits); // VC9 (not gcc)

(5) A macro may perhaps be preferred. In gcc, you can define a union cast to any given type very neatly:

#define CAST(type, x) (((union {typeof(x) src; type dst;})(x)).dst)   // gcc
// ...
printf("%08x\n", CAST(uint32_t, foo));

With VC9 and other compilers, there is no typeof, and pointers may be needed (don't use this with gcc--see note at end):

#define CAST(typeof_x, type, x) (((union {typeof_x src; type dst;}*)&(x))->dst)

Self-documenting, and safer. And not too ugly. All these suggestions are likely to compile to identical code, so efficiency is not an issue. See also my related answer: How to format a function pointer?.

Warning about gcc: The GCC Manual version 4.3.4 (but not version 4.3.0) states that this last example, with &(x), is undefined behaviour. See http://davmac.wordpress.com/2010/01/08/gcc-strict-aliasing-c99/ and http://gcc.gnu.org/ml/gcc/2010-01/msg00013.html.

C++ unions vs. reinterpret_cast

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.

Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.

This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

Is it legal to cast a struct to an union containing it?

In general, no.

Suppose you had the following definitions:

struct foo {
int flag;
double foo;
};
struct bar {
int flag;
int bar;
};
union foobar {
struct foo foo;
struct bar bar;
};

The likely alignment of struct foo would be 8 while the likely alignment of struct bar would be 4. So if you did this:

struct bar b;
union foobar *fb = (union foobar *)&b;

You could run into an alignment issue. Section 6.3.2.3p7 of C11 states:

A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
Otherwise,
when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a
pointer to a character type, the result points to the lowest addressed
byte of the object. Successive increments of the result, up to the
size of the object, yield pointers to the remaining bytes of the
object.

So if b is not aligned on an 8 byte boundary, you have undefined behavior.

Is casting to and from char pointer well defined in standard C?

malloc is intended to be used to reserve memory and create new objects in that memory. This code is fully defined by the C standard (supposing the appropriate definitions and #include lines precede it, of course):

void *test = malloc(sizeof (struct foo));
((struct foo *) test)->member = 10;

There is no aliasing problem here because the reserved memory initially has no object type associated with it. Once a value is stored into it using any type other than a character type, the type of that object becomes the effective type for the memory. The address returned by malloc is suitable for all the basic types (char, integer types, floating-point types), enumerator types, pointer types, arrays of these types, structures or unions of these types without an overriding alignment specifier, and all the complete object types in the standard library (like struct tm).

Pedantically, there are some inadequacies in the standard. The assignment above only writes to a member of a structure, not the whole structure, so is the effective type of the allocated memory set to be that of the whole structure? The language in the standard does not formally cover this and similar issues. However, it is entirely clear this code is intended to work, and all compilers support it.

Similarly, there are some inadequacies in the standard regarding pointer conversions, which is why I changed your code to use void *test instead of char *test. The standard does not explicitly said that converting the void * result of malloc to some other type, like struct foo *, actually produces a pointer to the same memory. (It just says that, if we convert the pointer back, we will get something equal to the original.) But it is clear memory reserved by malloc is intended to be used in this way. Inserting another conversion, first to char * and then to struct foo * complicates the matter further. Again, it should work, and it will work in all compilers, but the standard does not say it explicitly.

You can use memcpy to read or write objects in the allocated memory, but it is not necessary, unless you actually want to do aliasing where you have written an object of one type and want to reinterpret its bytes as another type. (This can be done with memcpy or with a union.)

Is const-casting via a union undefined behaviour?

It's implementation defined, see C99 6.5.2.3/5:

if the value of a member of a union object is used when the most
recent store to the object was to a different member, the behavior is
implementation-defined.

Update: @AaronMcDaid commented that this might be well-defined after all.

The standard specified the following 6.2.5/27:

Similarly, pointers to qualified or unqualified versions of compatible
types shall have the same representation and alignment
requirements.27)

27) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions.

And (6.7.2.1/14):

A pointer to a union object, suitably converted, points to each of its
members (or if a member is a bitfield, then to the unit in which it
resides), and vice versa.

One might conclude that, in this particular case, there is only room for exactly one way to access the elements in the union.



Related Topics



Leave a reply



Submit