Is It a Good Practice to Use Unions in C++

Is it a good practice to use unions in C++?

Unions can be fine, as long as you use them carefully.

They can be used in two ways:

  1. To allow a single type of data to be accessed in several ways (as in your example, accessing a colour as an int or (as you probably intended) four chars)

  2. To make a polymorphic type (a single value that could hold an int or a float for example).

Case (1) Is fine because you're not changing the meaning of the type - you can read and write any of the members of the union without breaking anything. This makes it a very convenient and efficient way of accessing the same data in slightly different forms.

Case (2) can be useful, but is extremely dangerous because you need to always access the right type of data from within the union. If you write an int and try to read it back as a float, you'll get a meaningless value. Unless memory usage is your primary consideration it might be better to use a simple struct with two members in it.

Unions used to be vital in C. In C++ there are usually much nicer ways to achieve the same ends (e.g. a class can be used to wrap a value and allow it to be accessed in different ways). However, if you need raw performance or have a critical memory situation, unions may still be a useful approach.

Purpose of Unions in C and C++

The purpose of unions is rather obvious, but for some reason people miss it quite often.

The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it.

It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accommodations to a relatively large number of people, which is what hotels are for.

That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.

For some reason, this original purpose of the union got "overridden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is not a valid use of unions. It generally leads to undefined behavior is described as producing implementation-defined behavior in C89/90.

EDIT: Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigenda to the C99 standard (see DR#257 and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.

what is the use of union in c?

Probably the two most common uses for a union are:

  • To implement your own Variant type, a union gives you the ability to represent all the varying types without wasting memory. This answer gives a good example.

  • Type punning but I would read Understanding Strict Aliasing as well since there are many cases where type punning is undefined behavior. Although in practice most compilers support type punning through a union.

What is the point behind unions in C?

Well, you almost answered your question: Memory.
Back in the days memory was rather low, and even saving a few kbytes has been useful.

But even today there are scenarios where unions would be useful. For example, if you'd like to implement some kind of variant datatype. The best way to do this is using a union.

This doesn't sound like much, but let's just assume you want to use a variable either storing a 4 character string (like an ID) or a 4 byte number (which could be some hash or indeed just a number).

If you use a classic struct, this would be 8 bytes long (at least, if you're unlucky there are filling bytes as well). Using an union it's only 4 bytes. So you're saving 50% memory, which isn't a lot for one instance, but imagine having a million of these.

While you can achieve similar things by casting or subclassing a union is still the easiest way to do this.

Is this use of unions strictly conforming?

I believe that your code is conformant, and there is a flaw with the -fstrict-aliasing mode of GCC and Clang.

I cannot find the right part of the C standard, but the same problem happens when compiling your code in C++ mode for me, and I did find the relevant passages of the C++ Standard.

In the C++ standard, [class.union]/5 defines what happens when operator = is used on a union access expression. The C++ Standard states that when a union is involved in the member access expression of the built-in operator =, the active member of the union is changed to the member involved in the expression (if the type has a trivial constructor, but because this is C code, it does have a trivial constructor).

Note that write_s2x cannot change the active member of the union, because a union is not involved in the assignment expression. Your code does not assume that this happens, so it's OK.

Even if I use placement new to explicitly change which union member is active, which ought to be a hint to the compiler that the active member changed, GCC still generates code that outputs 4321.

This looks like a bug with GCC and Clang assuming that the switching of active union member cannot happen here, because they fail to recognize the possibility of p1, p2 and p3 all pointing to the same object.

GCC and Clang (and pretty much every other compiler) support an extension to C/C++ where you can read an inactive member of a union (getting whatever potentially garbage value as a result), but only if you do this access in a member access expression involving the union. If v1 were not the active member, read_s1x would not be defined behavior under this implementation-specific rule, because the union is not within the member access expression. But because v1 is the active member, that shouldn't matter.

This is a complicated case, and I hope that my analysis is correct, as someone who isn't a compiler maintainer or a member of one of the committees.

Actual usage of union in C

Where I use it constantly: parsing a configuration file I store all values in a union data type. E.g. when values can be int types or strings, I would use a data structure as follows:

struct cval_s {
short type;
union {
int ival;
char *cval;
} val;
};

In complexer problems, I use them, too. E.g. once I wrote an interpreter for an easy scripting language, and a value in this language was represented by a struct containing a union.

C: Where is union practically used?

I usually use unions when parsing text. I use something like this:

typedef enum DataType { INTEGER, FLOAT_POINT, STRING } DataType ;

typedef union DataValue
{
int v_int;
float v_float;
char* v_string;
}DataValue;

typedef struct DataNode
{
DataType type;
DataValue value;
}DataNode;

void myfunct()
{
long long temp;
DataNode inputData;

inputData.type= read_some_input(&temp);

switch(inputData.type)
{
case INTEGER: inputData.value.v_int = (int)temp; break;
case FLOAT_POINT: inputData.value.v_float = (float)temp; break;
case STRING: inputData.value.v_string = (char*)temp; break;
}
}

void printDataNode(DataNode* ptr)
{
printf("I am a ");
switch(ptr->type){
case INTEGER: printf("Integer with value %d", ptr->value.v_int); break;
case FLOAT_POINT: printf("Float with value %f", ptr->value.v_float); break;
case STRING: printf("String with value %s", ptr->value.v_string); break;
}
}

If you want to see how unions are used HEAVILY, check any code using flex/bison. For example see splint, it contains TONS of unions.



Related Topics



Leave a reply



Submit