How to Avoid Floating Point Precision Errors with Floats or Doubles in Java

How to avoid floating point precision errors with floats or doubles in Java?

There is a no exact representation of 0.1 as a float or double. Because of this representation error the results are slightly different from what you expected.

A couple of approaches you can use:

  • When using the double type, only display as many digits as you need. When checking for equality allow for a small tolerance either way.
  • Alternatively use a type that allows you to store the numbers you are trying to represent exactly, for example BigDecimal can represent 0.1 exactly.

Example code for BigDecimal:

BigDecimal step = new BigDecimal("0.1");
for (BigDecimal value = BigDecimal.ZERO;
value.compareTo(BigDecimal.ONE) < 0;
value = value.add(step)) {
System.out.println(value);
}

See it online: ideone

How to actually avoid floating point errors when you need to use float?

I would use a Rational class. There are many out there - this one looks like it should work.

One significant cost will be when the Rational is rendered into a float and one when the denominator is reduced to the gcd. The one I posted keeps the numerator and denominator in fully reduced state at all times which should be quite efficient if you are always adding or subtracting 1/10.

This implementation holds the values normalised (i.e. with consistent sign) but unreduced.

You should choose your implementation to best fit your usage.

Avoid floating-point precision error in algorithm

BigDecimal could help, but I would suggest reading the whole answer.

Floating point errors are 'normal' in a sense, that you cannot store every floating point number exact within a variable. There are many resources out there how to deal with this problem, a few links here:

  1. If you do not know what the actual problem is check this out.
  2. What Every Computer Scientist Should Know About Floating-Poit Arithmetic
  3. IEEE floating point
  4. To give you an idead how to work: Quantity Pattern

Retain precision with double in Java

As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.

Now, a little explanation into why this is happening:

The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.

More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:

  • 1 bit denotes the sign (positive or negative).
  • 11 bits for the exponent.
  • 52 bits for the significant digits (the fractional part as a binary).

These parts are combined to produce a double representation of a value.

(Source: Wikipedia: Double precision)

For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.

The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.

When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.

As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.

From the Java API Reference for the BigDecimal class:

Immutable,
arbitrary-precision signed decimal
numbers. A BigDecimal consists of an
arbitrary precision integer unscaled
value and a 32-bit integer scale. If
zero or positive, the scale is the
number of digits to the right of the
decimal point. If negative, the
unscaled value of the number is
multiplied by ten to the power of the
negation of the scale. The value of
the number represented by the
BigDecimal is therefore (unscaledValue
× 10^-scale).

There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:

  • Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
  • How to print really big numbers in C++
  • How is floating point stored? When does it matter?
  • Use Float or Decimal for Accounting Application Dollar Amount?

If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Why not use Double or Float to represent currency?

Because floats and doubles cannot accurately represent the base 10 multiples that we use for money. This issue isn't just for Java, it's for any programming language that uses base 2 floating-point types.

In base 10, you can write 10.25 as 1025 * 10-2 (an integer times a power of 10). IEEE-754 floating-point numbers are different, but a very simple way to think about them is to multiply by a power of two instead. For instance, you could be looking at 164 * 2-4 (an integer times a power of two), which is also equal to 10.25. That's not how the numbers are represented in memory, but the math implications are the same.

Even in base 10, this notation cannot accurately represent most simple fractions. For instance, you can't represent 1/3: the decimal representation is repeating (0.3333...), so there is no finite integer that you can multiply by a power of 10 to get 1/3. You could settle on a long sequence of 3's and a small exponent, like 333333333 * 10-10, but it is not accurate: if you multiply that by 3, you won't get 1.

However, for the purpose of counting money, at least for countries whose money is valued within an order of magnitude of the US dollar, usually all you need is to be able to store multiples of 10-2, so it doesn't really matter that 1/3 can't be represented.

The problem with floats and doubles is that the vast majority of money-like numbers don't have an exact representation as an integer times a power of 2. In fact, the only multiples of 0.01 between 0 and 1 (which are significant when dealing with money because they're integer cents) that can be represented exactly as an IEEE-754 binary floating-point number are 0, 0.25, 0.5, 0.75 and 1. All the others are off by a small amount. As an analogy to the 0.333333 example, if you take the floating-point value for 0.01 and you multiply it by 10, you won't get 0.1. Instead you will get something like 0.099999999786...

Representing money as a double or float will probably look good at first as the software rounds off the tiny errors, but as you perform more additions, subtractions, multiplications and divisions on inexact numbers, errors will compound and you'll end up with values that are visibly not accurate. This makes floats and doubles inadequate for dealing with money, where perfect accuracy for multiples of base 10 powers is required.

A solution that works in just about any language is to use integers instead, and count cents. For instance, 1025 would be $10.25. Several languages also have built-in types to deal with money. Among others, Java has the BigDecimal class, and Rust has the rust_decimal crate, and C# has the decimal type.

Arithmetic comparison to avoid floating point errors

I think that's a mistake we (Java developers) often make, but in general it can be observed in almost all programming languages — at least the ones I use these days.

The entire thing boils down to how those values are stored in memory — if I'm not mistaken, in Java, floats and doubles are stored using the IEEE 754 standard format.

Anyway, instead of using the equality operator (==), we should rely on relational operators: less than (<) or greater than (>) to compare float and double values, because in the end, floating point numbers are just an approximation, they are not exact.

Another approach (I personally like) is to use Float.compare / Double.compare, if you are more familiar with the semantics of compareTo-kind of comparisons.

Alternatively, I sometimes use also BigDecimal for exact calculations.

How to deal with float rounding errors

Inaccurate Method

When you are using numbers that require Precise calculations you need to be sure that you aren't doing something like: (and this is what it seems like you are currently doing)

error accumulation

This will result in the accumulation of rounding errors as the process continues; giving you extremely innacurate data long-term. In the above example, you are actually rounding off the starting float 4 times, each time it becomes more and more inaccurate!



Accurate Method

A better and more accurate way of obtaining numbers is to do this:
avoid accumulation of rounding errors

This will help you to avoid the accumulation of rounding errors because each calculation is based off of only 1 conversion and the results from that conversion are not compounded into the next calculation.

The best method of attack would be to start at the highest precision that is necessary, then convert on an as-needed basis, but leave the original intact. I would suggest you to follow the process from the second picture that I posted.

I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).

It's important to note that you should not attempt to perform any type of rounding of your values if there is not noticeable impact on your end result; you will simply be doing more work for little to no gain, and may even suffer a performance decrease if done often enough.

How can I handle precision error with float in Java?

If you really care about precision, you should use BigDecimal

https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/math/BigDecimal.html

How to overcome inaccuracy in Java

You have to take a bit of a zen* approach to floating-point numbers: rather than eliminating the error, learn to live with it.

In practice this usually means doing things like:

  • when displaying the number, use String.format to specify the amount of precision to display (it'll do the appropriate rounding for you)
  • when comparing against an expected value, don't look for equality (==). Instead, look for a small-enough delta: Math.abs(myValue - expectedValue) <= someSmallError

EDIT: For infinity, the same principle applies, but with a tweak: you have to pick some number to be "large enough" to treat as infinity. This is again because you have to learn to live with, rather than solve, imprecise values. In the case of something like tan(90 degrees), a double can't store π/2 with infinite precision, so your input is something very close to, but not exactly, 90 degrees -- and thus, the result is something very big, but not quite infinity. You may ask "why don't they just return Double.POSITIVE_INFINITY when you pass in the closest double to π/2," but that could lead to ambiguity: what if you really wanted the tan of that number, and not 90 degrees? Or, what if (due to previous floating-point error) you had something that was slightly farther from π/2 than the closest possible value, but for your needs it's still π/2? Rather than make arbitrary decisions for you, the JDK treats your close-to-but-not-exactly π/2 number at face value, and thus gives you a big-but-not-infinity result.

For some operations, especially those relating to money, you can use BigDecimal to eliminate floating-point errors: you can really represent values like 0.1 (instead of a value really really close to 0.1, which is the best a float or double can do). But this is much slower, and doesn't help you for things like sin/cos (at least with the built-in libraries).

* this probably isn't actually zen, but in the colloquial sense



Related Topics



Leave a reply



Submit