Why Does the Floating-Point Value of 4*0.1 Look Nice in Python 3 But 3*0.1 Doesn'T

Why does the floating-point value of 4*0.1 look nice in Python 3 but 3*0.1 doesn't?

The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an "exact" operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but it cannot display 3*0.1 as 0.3 because these are not equal.

You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what's going on under the hood.

>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'

0.1 is 0x1.999999999999a times 2^-4. The "a" at the end means the digit 10 - in other words, 0.1 in binary floating point is very slightly larger than the "exact" value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so 4*0.1 == 0.4.

However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.

Python 3's float repr is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (float(repr(f)) == f for all floats f). Therefore, it cannot display 0.3 and 0.1*3 exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3's repr engine chooses to display one with a slight apparent error.

Why 0.1 is sometimes printed exactly and sometimes not?

Because the numbers are not equal. Consider the following, when you pass a float directly to Decimal to get a better representation of what is going on:

>>> from decimal import Decimal
>>> Decimal(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> Decimal(0.5 - 0.4)
Decimal('0.09999999999999997779553950749686919152736663818359375')

Notice, that the actual floating point value you get from the literal 0.1 is different than the result of the subtraction of floats that are created from the literals 0.5 and 0.4. This is because the two literals also can have an error (note, 0.5 can be represented exactly, because it can be expressed as a power of two).

>>> Decimal(0.5)
Decimal('0.5')
>>> Decimal(0.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')

Note, the string python prints when you print a number is even more of an approximation, although Python uses an algorithm which should produce the shortest representation that can reliably reproduce the actual floating point value from the literal:

>>> 0.5 - 0.4
0.09999999999999998
>>> Decimal(0.09999999999999998)
Decimal('0.09999999999999997779553950749686919152736663818359375')
>>> Decimal(0.5 - 0.4)
Decimal('0.09999999999999997779553950749686919152736663818359375')

Note, pretty much any language I can think of uses a truncated form of the floating point number that is actually represented in hardware. If you want to see more, you generally have to use string formatting.

Why is floating point slicing (slice(0,1,0.1)) allowed in python, but calling the indices method (slice(0,1,0.1).indices) raises TypeError?

tl;dr slice objects are generated by the interpreter when we use them in square-bracket notation, and allowing them to be arbitrary allows us to design code that uses them. However, because the built-in list relies on indices() to behave properly, that method has to return list-compatible values (i.e. integers), and if it can't, it throws an error.


When you do

my_obj[1:3:2]

the interpreter essentially translates it* as

my_obj.__getitem__(slice(1, 3, 2))

This is most obvious when using lists, which have special behavior when slices are given, but this behavior is also other datatypes in various popular libraries (e.g. numpy.array and pandas.Dataframe). These classes implement their own __getitem__() methods, that have their own special ways to handle slices.

Now, the built-in list presumably uses slice.indices() to decompose the entire slice into a set of individual indices that it can access and then group together and return. List indices can only be integers, and they don't want this functionality to break, so the most consistent way to go about it is to make slice.indices() throw an error when it can't produce a list of integers.

They can't restrict slice to having only those values, though, because it's an interpreter-generated object that other user-defined classes might want to use. If you design an object like this:

class myGenerator:
def __getitem__(self, s): # s is a slice
def gen():
i = s.start
while i < s.stop:
yield i
i += s.step
return list(gen())

h = myGenerator()
print(h[1:4:.25])
# [1, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.25, 3.5, 3.75]
print(h[0:1:0.1])
# [0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999]

then it can co-opt slice notation to work however it wants, so we can institute custom behavior for it. But if we were to change slice.indices() to use that instead, then it would break the built-in list - thus, python doesn't allow us to.


*technically, for a lot of built-ins, the python interpreter may take shortcuts and executes hardcoded routines instead of actually translating the notation into function calls and executing them. But for our purposes the analogy works well enough, since it does do that for user-generated objects.

Is floating point math broken?

Binary floating point math is like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

  • 0.1000000000000000055511151231257827021181583404541015625 in decimal, or
  • 0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

  • 0.1 in decimal, or
  • 0x1.99999999999999...p-4 in an analogue of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system, but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation, since 5/7 can't be represented exactly with any decimal number).

So no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Side Note: Working with Floats in Programming

In practice, this problem of precision means you need to use rounding functions to round your floating point numbers off to however many decimal places you're interested in before you display them.

You also need to replace equality tests with comparisons that allow some amount of tolerance, which means:

Do not do if (x == y) { ... }

Instead do if (abs(x - y) < myToleranceValue) { ... }.

where abs is the absolute value. myToleranceValue needs to be chosen for your particular application - and it will have a lot to do with how much "wiggle room" you are prepared to allow, and what the largest number you are going to be comparing may be (due to loss of precision issues). Beware of "epsilon" style constants in your language of choice. These can be used as tolerance values but their effectiveness depends on the magnitude (size) of the numbers you're working with, since calculations with large numbers may exceed the epsilon threshold.

Why `0.4/2` equals to `0.2` meanwhile `0.6/3` equals to `0.19999999999999998` in python?

Because the exact values of the floating point results are slightly different.

>>> '%.56f' % 0.4
'0.40000000000000002220446049250313080847263336181640625000'
>>> '%.56f' % (0.4/2)
'0.20000000000000001110223024625156540423631668090820312500'
>>> '%.56f' % 0.6
'0.59999999999999997779553950749686919152736663818359375000'
>>> '%.56f' % (0.6/3)
'0.19999999999999998334665463062265189364552497863769531250'
>>> '%.56f' % 0.2
'0.20000000000000001110223024625156540423631668090820312500'
>>> (0.2 - 0.6/3) == 2.0**-55
True

As you can see, the result that is printed as "0.2" is indeed slightly closer to 0.2. I added the bit at the end to show you what the exact value of the difference between these two numbers is. (In case you're curious, the above representations are the exact values - adding any number of digits beyond this just adds more zeroes).



Related Topics



Leave a reply



Submit