How to Avoid Floating Point Errors

How to avoid floating point errors?

This really has nothing to do with Python - you'd see the same behavior in any language using your hardware's binary floating-point arithmetic. First read the docs.

After you read that, you'll better understand that you're not adding one one-hundredth in your code. This is exactly what you're adding:

>>> from decimal import Decimal
>>> Decimal(.01)
Decimal('0.01000000000000000020816681711721685132943093776702880859375')

That string shows the exact decimal value of the binary floating ("double precision" in C) approximation to the exact decimal value 0.01. The thing you're really adding is a little bigger than 1/100.

Controlling floating-point numeric errors is the field called "numerical analysis", and is a very large and complex topic. So long as you're startled by the fact that floats are just approximations to decimal values, use the decimal module. That will take away a world of "shallow" problems for you. For example, given this small modification to your function:

from decimal import Decimal as D

def sqrt(num):
root = D(0)
while root * root < num:
root += D("0.01")
return root

then:

>>> sqrt(4)
Decimal('2.00')
>>> sqrt(9)
Decimal('3.00')

It's not really more accurate, but may be less surprising in simple examples because now it's adding exactly one one-hundredth.

An alternative is to stick to floats and add something that is exactly representable as a binary float: values of the form I/2**J. For example, instead of adding 0.01, add 0.125 (1/8) or 0.0625 (1/16).

Then look up "Newton's method" for computing square roots ;-)

What is the best way to avoid floating point calculation error here?

The result appears to be error-free, in binary. You're assuming that 48/10 can be expressed exactly ("simple number"). That's true in decimal digits, but not in bits. No surprise, really: 10 is not a power of 2, so dividing by 10 is not a simple matter of moving the point in a floating-point number.

If you need exact numbers, don't use an finite-precision format like Float. A Decimal type would be more appropriate.

C++ How to avoid floating-point arithmetic error

This is because floating point numbers have only a certain discrete precision.

The 0.2 is not really a 0.2, but is internally represented as a slightly different number.

That is why you are seeing a difference.

This is common in all floating point calculations, and you really can't avoid it.

What is the best way to deal with floating point errors?

There is no simple method of avoiding rounding errors in general-purpose floating-point arithmetic. The number 0.3 does not have an exact binary floating-point representation.

I would suggest reading What Every Computer Scientist Should Know About Floating-Point Arithmetic to familiarize yourself with the trade-offs inherent to floating-point representation of numbers.

To actually solve your issue, you should ask yourself a few questions:

  • How strict is your requirement for precision? Why is 0.30000000000000004 outside your margin for error? Is it acceptable to round your results?
  • Is there any way you could represent your numbers and perform most of your arithmetic with integers? E.g. if you know that you'll only encounter rational numbers, they can be represented using an integer quotient and an integer denominator. From there, you can attempt to defer casting to float for as long as possible to prevent cumulative rounding errors.
  • If you cannot perform your calculations on integers, is there an alternate datatype you can use, such as BigDecimal?

Ultimately, when it comes to issues with floating-point precision, you'll often have to tailor the solution to the requirements posed by your specific issue.

How to fix the floating point error in python

you can try any of the follow methods which is conventional for you

#for two decimal places use 2f, for 3 use 3f
val = 0.1 + 0.2
print(f"val = {val:.2f}")

#output | val = 0.30

#or else
print(round(val,2))

#output | val = 0.30

How to actually avoid floating point errors when you need to use float?

I would use a Rational class. There are many out there - this one looks like it should work.

One significant cost will be when the Rational is rendered into a float and one when the denominator is reduced to the gcd. The one I posted keeps the numerator and denominator in fully reduced state at all times which should be quite efficient if you are always adding or subtracting 1/10.

This implementation holds the values normalised (i.e. with consistent sign) but unreduced.

You should choose your implementation to best fit your usage.

Is floating point math broken?

Binary floating point math is like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

  • 0.1000000000000000055511151231257827021181583404541015625 in decimal, or
  • 0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

  • 0.1 in decimal, or
  • 0x1.99999999999999...p-4 in an analogue of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system, but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation, since 5/7 can't be represented exactly with any decimal number).

So no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Side Note: Working with Floats in Programming

In practice, this problem of precision means you need to use rounding functions to round your floating point numbers off to however many decimal places you're interested in before you display them.

You also need to replace equality tests with comparisons that allow some amount of tolerance, which means:

Do not do if (x == y) { ... }

Instead do if (abs(x - y) < myToleranceValue) { ... }.

where abs is the absolute value. myToleranceValue needs to be chosen for your particular application - and it will have a lot to do with how much "wiggle room" you are prepared to allow, and what the largest number you are going to be comparing may be (due to loss of precision issues). Beware of "epsilon" style constants in your language of choice. These are not to be used as tolerance values.

How to avoid floating point precision errors with floats or doubles in Java?

There is a no exact representation of 0.1 as a float or double. Because of this representation error the results are slightly different from what you expected.

A couple of approaches you can use:

  • When using the double type, only display as many digits as you need. When checking for equality allow for a small tolerance either way.
  • Alternatively use a type that allows you to store the numbers you are trying to represent exactly, for example BigDecimal can represent 0.1 exactly.

Example code for BigDecimal:

BigDecimal step = new BigDecimal("0.1");
for (BigDecimal value = BigDecimal.ZERO;
value.compareTo(BigDecimal.ONE) < 0;
value = value.add(step)) {
System.out.println(value);
}

See it online: ideone



Related Topics



Leave a reply



Submit