How Does Excel Successfully Round Floating Point Numbers Even Though They Are Imprecise

Why does rounding the floating-point number 1.4999999999999999 produce 2?

In x2 = 1.4999999999999999 and print(round(x2)), there are two operations that affect the value. The round function cannot operate directly on the number 1.4999999999999999 or the numeral “1.4999999999999999”. Its operand must be in the floating-point format that the Python implementation uses.

So, first, 1.4999999999999999 is converted to the floating-point format. Python is not strict about which floating-point format a Python implementation uses, but the IEEE-754 basic 64-bit binary format is common. In this format, the closest representable values to 1.4999999999999999 are 1.5 and 1.4999999999999997779553950749686919152736663818359375. The former is closer to 1.4999999999999999 than the latter is, so the former is used.

Thus, converting 1.4999999999999999 to the floating-point format produces 1.5. Then round(1.5) produces 2.

How to round display of float such that last two digits are always 00?

printf("%.5f\n",(int)(number*1000+0.5)/1000.0);

Different floating point result with optimization enabled - compiler bug?

Intel x86 processors use 80-bit extended precision internally, whereas double is normally 64-bit wide. Different optimization levels affect how often floating point values from CPU get saved into memory and thus rounded from 80-bit precision to 64-bit precision.

Use the -ffloat-store gcc option to get the same floating point results with different optimization levels.

Alternatively, use the long double type, which is normally 80-bit wide on gcc to avoid rounding from 80-bit to 64-bit precision.

man gcc says it all:

   -ffloat-store
Do not store floating point variables in registers, and inhibit
other options that might change whether a floating point value is
taken from a register or memory.

This option prevents undesirable excess precision on machines such
as the 68000 where the floating registers (of the 68881) keep more
precision than a "double" is supposed to have. Similarly for the
x86 architecture. For most programs, the excess precision does
only good, but a few programs rely on the precise definition of
IEEE floating point. Use -ffloat-store for such programs, after
modifying them to store all pertinent intermediate computations
into variables.

In x86_64 builds compilers use SSE registers for float and double by default, so that no extended precision is used and this issue doesn't occur.

gcc compiler option -mfpmath controls that.



Related Topics



Leave a reply



Submit