Why does rounding the floating-point number 1.4999999999999999 produce 2?
In x2 = 1.4999999999999999
and print(round(x2))
, there are two operations that affect the value. The round
function cannot operate directly on the number 1.4999999999999999 or the numeral “1.4999999999999999”. Its operand must be in the floating-point format that the Python implementation uses.
So, first, 1.4999999999999999 is converted to the floating-point format. Python is not strict about which floating-point format a Python implementation uses, but the IEEE-754 basic 64-bit binary format is common. In this format, the closest representable values to 1.4999999999999999 are 1.5 and 1.4999999999999997779553950749686919152736663818359375. The former is closer to 1.4999999999999999 than the latter is, so the former is used.
Thus, converting 1.4999999999999999 to the floating-point format produces 1.5. Then round(1.5)
produces 2.
How to round display of float such that last two digits are always 00?
printf("%.5f\n",(int)(number*1000+0.5)/1000.0);
Different floating point result with optimization enabled - compiler bug?
Intel x86 processors use 80-bit extended precision internally, whereas double
is normally 64-bit wide. Different optimization levels affect how often floating point values from CPU get saved into memory and thus rounded from 80-bit precision to 64-bit precision.
Use the -ffloat-store
gcc option to get the same floating point results with different optimization levels.
Alternatively, use the long double
type, which is normally 80-bit wide on gcc to avoid rounding from 80-bit to 64-bit precision.
man gcc
says it all:
-ffloat-store
Do not store floating point variables in registers, and inhibit
other options that might change whether a floating point value is
taken from a register or memory.
This option prevents undesirable excess precision on machines such
as the 68000 where the floating registers (of the 68881) keep more
precision than a "double" is supposed to have. Similarly for the
x86 architecture. For most programs, the excess precision does
only good, but a few programs rely on the precise definition of
IEEE floating point. Use -ffloat-store for such programs, after
modifying them to store all pertinent intermediate computations
into variables.
In x86_64 builds compilers use SSE registers for float
and double
by default, so that no extended precision is used and this issue doesn't occur.
gcc
compiler option -mfpmath
controls that.
Related Topics
Should I Still Return Const Objects in C++11
How Many Decimal Places Does the Primitive Float and Double Support
Is Updating Double Operation Atomic
Or Is Not Valid C++:Why Does This Code Compile
Why Does Glgetstring(Gl_Version) Return Null/Zero Instead of the Opengl Version
Sub-Millisecond Precision Timing in C or C++
Why Can't I Return Bigger Values from Main Function
Where Does the _1 Symbol Come from When Using Llvm's Libc++
C++ Virtual Function Table Memory Cost
Fastest Way to Produce a Mask with N Ones Starting at Position I
Differencebetween Wmain and Main
Why Does Makeintresource() Work
Std::Shared_Ptr and Initializer Lists
How to Take Ownership of an Abandoned Boost::Interprocess::Interprocess_Mutex