Handling Overflow When Casting Doubles to Integers in C

Proper way to fit int overflow in c, when we cast int to float to int

I know that a float to int conversion is safe, but an int to float conversion is not.

Each conversion has issues.

(Assume 32-bit int and 32-bit float for discussion.)

Large int to float risks lost of precision as float does not exactly encode all int. With OP's int num = 2147483647; (float)num, the 2147483647 was converted to 1 of 2 nearby float. With round to the nearest rounding mode, float: result was certainly 2147483648.0.

float to int truncates any fraction. Conversion from infinity and Not-a-number pose addition concerns.

float to int risks implementation-defined behavior when the the floating point value is not inside the -2,147,483,648.9999... 2,147,483,647.9999... range. This is the case with OP's int num = 2147483647; (int)(float)num attempting to convert an out of range 2147483648.0 to int. In OP's case, the value was apparently wrapped around (232 subtracted) to end with -2147483648.



Which way is best to handle it? Also, what other exceptions should I consider?

With conversion int to float, expect rounding for large int.

With conversion float to int, expect truncation and perhaps test if value is in range.

With 2's complement integer encoding: a test to prevent out of range conversion.

#define FLT_INT_MAX_PLUS1 ((INT_MAX/2 + 1)*2.0f)

// return true when OK to convert
bool float_to_int_test(float x) {
return (x < FLT_INT_MAX_PLUS1) && (x - INT_MIN > -1.0f);
}

Other tests could be had to determine rounding or truncation.

floating point exception is thrown when casting double to int

As this is UB by the C++ standard, this is of course not specified by the language itself.

However, your implementation follows IEEE-754 – the standard most implementations base their floating point behavior on – in this regard, which states:

When a NaN or infinite operand cannot be represented in the destination format and this cannot otherwise
be indicated, the invalid operation exception shall be signaled. When a numeric operand would convert to
an integer outside the range of the destination format, the invalid operation exception shall be signaled if
this situation cannot otherwise be indicated.

(5.8 "Details of conversions from floating-point to integer formats", emphasis mine)

How those exceptions can be handled when signaled is left to the implementation; enabling a trap for them is one of the possibilities.

Further reading: gcc's documentation on FP exceptions

Errors in Casting Doubles to Integers

It is important to understand that not all rational numbers are representable in finite precision. Also, it is important to understand that set of numbers which are representable in finite precision in decimal base, is different from the set of numbers that are representable in finite precision in binary base. Finally, it is important to understand that your CPU probably represents floating point numbers in binary.

2029.00012 in particular happens to be a number that is not representable in a double precision IEEE 754 floating point (and it indeed is a double precision literal; you may have intended to use long double instead). It so happens that the closest number that is representable is 2029.000119999999924402800388634204864501953125. So, you're counting the significant digits of that number, not the digits of the literal that you used.

If the intention of 0.00001 was to stop counting digits when the number is close to a whole number, it is not sufficient to check whether the value is less than the threshold, but also whether it is greater than 1 - threshold, as the representation error can go either way:

 if(test <= 0.00001 || test >= 1 - 0.00001)

After all, you can multiple 0.99999999999999999999999999 with 10 many times until the result becomes close to zero, even though that number is very close to a whole number.



Related Topics



Leave a reply



Submit