Accessing Inactive Union Member and Undefined Behavior

Accessing inactive union member and undefined behavior?

The confusion is that C explicitly permits type-punning through a union, whereas C++ (c++11) has no such permission.

c11

6.5.2.3 Structure and union members


95) If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.

The situation with C++:

c++11

9.5 Unions [class.union]


In a union, at most one of the non-static data members can be active at any time, that is, the value of at
most one of the non-static data members can be stored in a union at any time.

C++ later has language permitting the use of unions containing structs with common initial sequences; this doesn't however permit type-punning.

To determine whether union type-punning is allowed in C++, we have to search further. Recall that c99 is a normative reference for C++11 (and C99 has similar language to C11 permitting union type-punning):

3.9 Types [basic.types]


4 - The object representation of an object of type T is the sequence of N unsigned char objects taken up by
the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that
hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object
representation that determines a value, which is one discrete element of an implementation-defined set of
values. 42

42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

It gets particularly interesting when we read

3.8 Object lifetime [basic.life]


The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained, and
— if the object has non-trivial initialization, its initialization is complete.

So for a primitive type (which ipso facto has trivial initialization) contained in a union, the lifetime of the object encompasses at least the lifetime of the union itself. This allows us to invoke

3.9.2 Compound types [basic.compound]


If an object of type T is located at an address A, a pointer of type cv T* whose value is the
address A is said to point to that object, regardless of how the value was obtained.

Assuming that the operation we are interested in is type-punning i.e. taking the value of a non-active union member, and given per the above that we have a valid reference to the object referred to by that member, that operation is lvalue-to-rvalue conversion:

4.1 Lvalue-to-rvalue conversion [conv.lval]


A glvalue of a non-function, non-array type T can be converted to a prvalue.
If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The question then is whether an object that is a non-active union member is initialized by storage to the active union member. As far as I can tell, this is not the case and so although if:

  • a union is copied into char array storage and back (3.9:2), or
  • a union is bytewise copied to another union of the same type (3.9:3), or
  • a union is accessed across language boundaries by a program element conforming to ISO/IEC 9899 (so far as that is defined) (3.9:4 note 42), then

the access to a union by a non-active member is defined and is defined to follow the object and value representation, access without one of the above interpositions is undefined behaviour. This has implications for the optimisations allowed to be performed on such a program, as the implementation may of course assume that undefined behaviour does not occur.

That is, although we can legitimately form an lvalue to a non-active union member (which is why assigning to a non-active member without construction is ok) it is considered to be uninitialized.

C++ Union Member Access And Undefined Behaviour

Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?

Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.

and is this what is meant by common initial sequence?

The common initial sequence is when two standard layout types share the same members in the same order. In

struct foo
{
int a;
int b;
};

struct bar
{
int a;
int b;
int c;
};

a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.

This is not your case though since wId isn't a standard layout type struct.

Accessing same-type inactive member in unions

Yes you can read the other member in this particular case.

This is what the C++11/14 standard has to say:

9.5 - Unions

In a union, at most one of the non-static data members can be active
at any time, that is, the value of at most one of the non-static data
members can be stored in a union at any time.

But the note immediately after the section makes your particular instance legal since one special guarantee is made in order to simplify the use of unions:

[ Note: If a standard-layout union contains several standard-layout
structs that share a common initial sequence (9.2), and if an object
of this standard-layout union type contains one of the standard-layout
structs, it is permitted to inspect the common initial sequence of any
of standard-layout struct members; see 9.2. —end note ]

And your structs do share a common initial sequence:

9.2.16 - Class members

The common initial sequence of two standard-layout
struct (Clause 9) types is the longest sequence of non- static data
members and bit-fields in declaration order, starting with the first
such entity in each of the structs, such that corresponding entities
have layout-compatible types and either neither entity is a bit-field
or both are bit-fields with the same width.

Accessing inactive union members

It's probably compiler specific due to how it arranges the underlying data in the union. Essentially accessing by an 'inactive' member is just interpreting the data differently. Interpreting a large int as a smaller one should work.

[FF|01] < a uint16

A uint8 just reads the first byte of that data:

[FF|01]
^ read
^ ignored

Interpreting a float as an int or viceversa is unlikely to work, since the underlying bits won't make sense:

[0x1|0xF|0x7FFFFF]
^ 23-bit mantissa
^ 8-bit exponenent
^ sign bit

Is it undefined behaviour to read a different member than was written in a Union?

Yes the behaviour is undefined in C++.

When you write a value to a member of union, think of that member becoming the active member.

The behaviour of reading any member of a union that is not the active member is undefined.

in C++, a union is often coupled with another variable that serves as a means of identifying the active member.

Call member function of non-active union member

Your code has undefined behavior. The initial common sequence rule wont help you here since you are not accessing a common member but instead you are accessing the entire object in order to call the member function1.

What I am trying to achieve is well-defined version of common approach to name array field like union a { int data[3]; int x, y, z};

The way to do this in C++ is to use a struct and operator overloading. Instead of having an array and trying to map it to individual members, you have individual members and then pretend that your class is an array. That looks like

struct vec
{
int x, y, z;
int& operator[](size_t index)
{
switch(index)
{
case 0: return x;
case 1: return y;
case 2: return z;
}
}
};

1: to call a member function is to pass that object to that function as if it was the first parameter of the function.

Is the following C union access pattern undefined behavior?

Defect report 283: Accessing a non-current union member ("type punning") covers this and tells us there is undefined behavior if there is trap representation.

The defect report asked:

In the paragraph corresponding to 6.5.2.3#5, C89 contained this
sentence:

With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the
behavior is implementation-defined.


Associated with that sentence was this footnote:

The "byte orders" for scalar types are invisible to isolated programs that do not indulge in type punning (for example, by
assigning to one member of a union and inspecting the storage by
accessing another member that is an appropriately sixed array of
character type), but must be accounted for when conforming to
externally imposed storage layouts.


The only corresponding verbiage in C99 is 6.2.6.1#7:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that
member but do correspond to other members take unspecified values, but
the value of the union object shall not thereby become a trap
representation.


It is not perfectly clear that the C99 words have the same
implications as the C89 words.

The defect report added the following footnote:

Attach a new footnote 78a to the words "named member" in 6.5.2.3#3:

78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

C11 6.2.6.1 General tells us:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.



Related Topics



Leave a reply



Submit