Float Bits and Strict Aliasing

float bits and strict aliasing

About the only way to truly avoid any issues is to memcpy.

unsigned int FloatToInt( float f )
{
static_assert( sizeof( float ) == sizeof( unsigned int ), "Sizes must match" );
unsigned int ret;
memcpy( &ret, &f, sizeof( float ) );
return ret;
}

Because you are memcpying a fixed amount the compiler will optimise it out.

That said the union method is VERY widely supported.

How to get float bit representation without UB in C++?

The standard way to do it, as pointed out by Jason Turner, is to use memcpy:

float f = 1.0;
std::byte c[sizeof(f)];
memcpy(c, &f, sizeof(f));

You may be thinking that you do not want to copy anything, that you just want to see the bits/bytes. Well the compilers are smart and they will in fact optimize it away as demonstrated by Jason so do not worry and use memcpy for this kind of thing and never reinterpret_cast.

Compare 2 floats by their bitwise representation in C

Presumably the assignment assumes float uses IEEE-754 binary32 and unsigned is 32 bits.

It is not proper to alias float objects with an unsigned type, although some C implementations support it. Instead, you can create a compound literal union, initialize its float member with the float value, and access its unsigned member. (This is supported by the C standard but not by C++.)

After that, it is simply a matter of dividing the comparison into cases depending on the sign bits:

#include <stdbool.h>

bool func(float x, float y) {
unsigned* uxp = & (union { float f; unsigned u; }) {x} .u;
unsigned* uyp = & (union { float f; unsigned u; }) {y} .u;
unsigned ux = *uxp;
unsigned uy = *uyp;
unsigned sx = (ux>>31);
unsigned sy = (uy>>31);
return
sx && sy ? uy < ux : // Negative values are in "reverse" order.
sx && !sy ? (uy | ux) & 0x7fffffffu : // Negative x is always less than positive y except for x = -0 and y = +0.
!sx && sy ? 0 : // Positive x is never less than negative y.
ux < uy ; // Positive values are in "normal" order.
}

#include <stdio.h>

int main(void)
{
// Print expected values and function values for comparison.
printf("1, %d\n", func(+3, +4));
printf("1, %d\n", func(-3, +4));
printf("0, %d\n", func(+3, -4));
printf("0, %d\n", func(-3, -4));
printf("0, %d\n", func(+4, +3));
printf("1, %d\n", func(-4, +3));
printf("0, %d\n", func(+4, -3));
printf("1, %d\n", func(-4, -3));
}

Sample output:


1, 1
1, 1
0, 0
0, 0
0, 0
1, 1
0, 0
1, 1

What is the strict aliasing rule?

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

typedef struct Msg
{
unsigned int a;
unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
// Get a 32-bit buffer from the system
uint32_t* buff = malloc(sizeof(Msg));

// Alias that buffer through message
Msg* msg = (Msg*)(buff);

// Send a bunch of messages
for (int i = 0; i < 10; ++i)
{
msg->a = i;
msg->b = i+1;
SendWord(buff[0]);
SendWord(buff[1]);
}
}

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 71 is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

void SendMessage(uint32_t* buff, size_t size32)
{
for (int i = 0; i < size32; ++i)
{
SendWord(buff[i]);
}
}

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
msg->a = i;
msg->b = i+1;
SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

  • Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.

      union {
    Msg msg;
    unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
    };
  • You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))

  • You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

1 The types that C 2011 6.5 7 allows an lvalue to access are:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

Buffer filled with different types of data, and strict aliasing

Even though I wished all the time there would be a nice way, currently there is non. You will have to use no-strict-aliasing flag of the compiler of your choice.

For std::bit_cast you will have to wait until C++20. There is no standard conform way without using memcpy as far as I know.

Also have a look at this bit_cast proposal and this website.

Does accessing the 4 bytes of a float break C++ aliasing rules

Is this actually valid C++ code?

Potentially yes. It has some pre-conditions:

  • std::uint8_t must be an alias of unsigned char
  • sizeof(float) must be 4
  • bytes + 3 mustn't overflow a buffer.

You can add a checks to ensure safe failure to compile if the first two don't hold:

static_assert(std::is_same_v<unsigned char, std::uint8_t>);
static_assert(sizeof(float) == 4);

I'm not sure whether it violates any aliasing rules.

unsigned char is excempted of such restrictions. std::uint8_t, if it is defined, is in practice an alias of unsigned char, in which case the shown program is well defined. Technically that's not guaranteed by the rules, but the above check will handle the theoretical case where that doesn't apply.



float is guaranteed to be at least 32 bits long.

It must be exactly 32 bits long for the code to work. It must also have exactly the same bit-level format as was on the system where the float was serialised. If it's standard IEE-754 single precision on both ends then you're good; otherwise all bets are off.

Is this strict aliasing violation? Can any type pointer alias a char pointer?

Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.

The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.

(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.

*It does matter even in case of no const members, probably, not sure, relevant question . @eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.

For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.

That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.

Addresing your second example

char* buffer = (char*)malloc(16); //OK

// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);

// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;

Strict aliasing and memory locations

I think that yes, it is legal.

To illustrate my point, let's see this code:

struct S
{
int i;
float f;
};
char *p = malloc(sizeof(struct S));

int *i = p + offsetof(struct S, i); //this offset is 0 by definition
*i = 456;
float *f = p + offsetof(struct S, f);
*f= 2.71f;

This code is, IMO, clearly legal, and it is equivalent to yours from a compiler point of view, for appropriate values of PaddingBytesFloat() and MAX_PAD.

Note that my code does not use any l-value of type struct S, it is only used to ease the calculation of the paddings.

As I read the standard, in malloc'ed memory has no declared type until something is written there. Then the declared type is whatever is written. Thus the declared type of such memory can be changed any time, overwriting the memory with a value of different type, much like an union.

TL; DR: My conclusion is that with dynamic memory you are safe, with regard to strict-aliasing as long as you read the memory using the same type (or a compatible one) you use to last write to that memory.



Related Topics



Leave a reply



Submit