How to Represent 0.1 in Floating Point Arithmetic and Decimal

How To Represent 0.1 In Floating Point Arithmetic And Decimal

I've always pointed people towards Harald Schmidt's online converter, along with the Wikipedia IEEE754-1985 article with its nice pictures.

For those two specific values, you get (for 0.1):

s How To Represent 0.1 In Floating Point Arithmetic And Decimal Why 0.1 represented in float correctly? (I know why not in result of 2.0-1.9) why is 0.1 * 10 equal to 1.0 mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 01111011 10011001100110011001101
| || || || || || +- 8388608
| || || || || |+--- 2097152
| || || || || +---- 1048576
| || || || |+------- 131072
| || || || +-------- 65536
| || || |+----------- 8192
| || || +------------ 4096
| || |+--------------- 512
| || +---------------- 256
| |+------------------- 32
| +-------------------- 16
+----------------------- 2

The sign is positive, that's pretty easy.

The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4, so the multiplier is 2-4 or 1/16.

The mantissa is chunky. It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2n) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.

When you add all these up, you get 1.60000002384185791015625.

When you multiply that by the multiplier, you get 0.100000001490116119384765625, which is why they say you cannot represent 0.1 exactly as an IEEE754 float, and provides so much opportunity on SO for people answering "why doesn't 0.1 + 0.1 + 0.1 == 0.3?"-type questions :-)


The 0.5 example is substantially easier. It's represented as:

s How To Represent 0.1 In Floating Point Arithmetic And Decimal Why 0.1 represented in float correctly? (I know why not in result of 2.0-1.9) why is 0.1 * 10 equal to 1.0 mmmmmmmmmmmmmmmmmmmmmmm
0 01111110 00000000000000000000000

which means it's the implicit base, 1, plus no other additives (all the mantissa bits are zero).

The sign is again positive. The exponent is 64+32+16+8+4+2 = 126 - 127 bias = -1. Hence the multiplier is 2-1 which is 1/2 or 0.5.

So the final value is 1 multiplied by 0.5, or 0.5. Voila!


I've sometimes found it easier to think of it in terms of decimal.

The number 1.345 is equivalent to

1 + 3/10   + 4/100 + 5/1000

or:

        -1       -2      -3
1 + 3*10 + 4*10 + 5*10

Similarly, the IEEE754 representation for decimal 0.8125 is:

s How To Represent 0.1 In Floating Point Arithmetic And Decimal Why 0.1 represented in float correctly? (I know why not in result of 2.0-1.9) why is 0.1 * 10 equal to 1.0 mmmmmmmmmmmmmmmmmmmmmmm
0 01111110 10100000000000000000000

With the implicit base of 1, that's equivalent to the binary:

         01111110-01111111
1.101 * 2

or:

                     -1
(1 + 1/2 + 1/8) * 2 (no 1/4 since that bit is 0)

which becomes:

(8/8 + 4/8 + 1/8) * 1/2

and then becomes:

13/8 * 1/2 = 0.8125

Why 0.1 represented in float correctly? (I know why not in result of 2.0-1.9)

System.out.println performs some rounding for floats and doubles. It uses Float.toString(), which itself (in the oraclew JDK) delegates to the FloatingDecimal class - you can have a look at the source of FloatingDecimal#toJavaFormatString() for gory details.

If you try:

BigDecimal bd = new BigDecimal(0.1f);
System.out.println(bd);

You will see the real value of 0.1f: 0.100000001490116119384765625.

why is 0.1 * 10 equal to 1.0 in python?

The exact value of decimal 0.1 can't be represented in 64-bit binary floating-point, so it gets rounded to the nearest representable value, which is 0.1000000000000000055511151231257827021181583404541015625.

However, while the exact value of 0.1000000000000000055511151231257827021181583404541015625 * 10 can be represented in binary, it would take more bits of precision than 64-bit binary floating-point has. The result also gets rounded to the nearest representable value, and it turns out the nearest representable value is exactly 1.0.

Basically, you have two rounding errors, and they happen to cancel.

Extreme floating point precision number 1.0-(0.1^200) in python

The standard fractions.Fraction class provides unlimited-precision exact rational arithmetic.

>>> from fractions import Fraction
>>> x = Fraction(1) - Fraction(1, 10**200)

decimal.Decimal might also suit your needs if you're looking for high-precision rather than exact arithmetic, but you'll have to tweak the context settings to allow more than the default 28 significant digits.

>>> from decimal import Decimal, Context
>>> decimal.setcontext(Context(prec=1000))
>>> x = Decimal(1) - 1 / Decimal(10**200)

Is floating point math broken?

Binary floating point math is like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

  • 0.1000000000000000055511151231257827021181583404541015625 in decimal, or
  • 0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

  • 0.1 in decimal, or
  • 0x1.99999999999999...p-4 in an analogue of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system, but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

** Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation, since 5/7 can't be represented exactly with any decimal number).

So no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Side Note: Working with Floats in Programming

In practice, this problem of precision means you need to use rounding functions to round your floating point numbers off to however many decimal places you're interested in before you display them.

You also need to replace equality tests with comparisons that allow some amount of tolerance, which means:

Do not do if (x == y) { ... }

Instead do if (abs(x - y) < myToleranceValue) { ... }.

where abs is the absolute value. myToleranceValue needs to be chosen for your particular application - and it will have a lot to do with how much "wiggle room" you are prepared to allow, and what the largest number you are going to be comparing may be (due to loss of precision issues). Beware of "epsilon" style constants in your language of choice. These are not to be used as tolerance values.

Why does the floating-point value of 4*0.1 look nice in Python 3 but 3*0.1 doesn't?

The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an "exact" operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but it cannot display 3*0.1 as 0.3 because these are not equal.

You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what's going on under the hood.

>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'

0.1 is 0x1.999999999999a times 2^-4. The "a" at the end means the digit 10 - in other words, 0.1 in binary floating point is very slightly larger than the "exact" value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so 4*0.1 == 0.4.

However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.

Python 3's float repr is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (float(repr(f)) == f for all floats f). Therefore, it cannot display 0.3 and 0.1*3 exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3's repr engine chooses to display one with a slight apparent error.

Why can't decimal numbers be represented exactly in binary?

Decimal numbers can be represented exactly, if you have enough space - just not by floating binary point numbers. If you use a floating decimal point type (e.g. System.Decimal in .NET) then plenty of values which can't be represented exactly in binary floating point can be exactly represented.

Let's look at it another way - in base 10 which you're likely to be comfortable with, you can't express 1/3 exactly. It's 0.3333333... (recurring). The reason you can't represent 0.1 as a binary floating point number is for exactly the same reason. You can represent 3, and 9, and 27 exactly - but not 1/3, 1/9 or 1/27.

The problem is that 3 is a prime number which isn't a factor of 10. That's not an issue when you want to multiply a number by 3: you can always multiply by an integer without running into problems. But when you divide by a number which is prime and isn't a factor of your base, you can run into trouble (and will do so if you try to divide 1 by that number).

Although 0.1 is usually used as the simplest example of an exact decimal number which can't be represented exactly in binary floating point, arguably 0.2 is a simpler example as it's 1/5 - and 5 is the prime that causes problems between decimal and binary.



Side note to deal with the problem of finite representations:

Some floating decimal point types have a fixed size like System.Decimal others like java.math.BigDecimal are "arbitrarily large" - but they'll hit a limit at some point, whether it's system memory or the theoretical maximum size of an array. This is an entirely separate point to the main one of this answer, however. Even if you had a genuinely arbitrarily large number of bits to play with, you still couldn't represent decimal 0.1 exactly in a floating binary point representation. Compare that with the other way round: given an arbitrary number of decimal digits, you can exactly represent any number which is exactly representable as a floating binary point.



Related Topics



Leave a reply



Submit