How to Handle Precision Error with Float in Java

How can I handle precision error with float in Java?

If you really care about precision, you should use BigDecimal

https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/math/BigDecimal.html

How to avoid floating point precision errors with floats or doubles in Java?

There is a no exact representation of 0.1 as a float or double. Because of this representation error the results are slightly different from what you expected.

A couple of approaches you can use:

  • When using the double type, only display as many digits as you need. When checking for equality allow for a small tolerance either way.
  • Alternatively use a type that allows you to store the numbers you are trying to represent exactly, for example BigDecimal can represent 0.1 exactly.

Example code for BigDecimal:

BigDecimal step = new BigDecimal("0.1");
for (BigDecimal value = BigDecimal.ZERO;
value.compareTo(BigDecimal.ONE) < 0;
value = value.add(step)) {
System.out.println(value);
}

See it online: ideone

Java Floating Point Precision Issue

Pass a String step (and a String value) to the BigDecimal constructor. You can't precisely represent 0.005 as a double (or a float).

 BigDecimal stepBD = new BigDecimal("0.005"); // <-- works as a `String`.

Edit

Or as noted below, use BigDecimal.valueOf(double)

 BigDecimal stepBD = BigDecimal.valueOf(0.005);

Precision lost in float value using java

I love these. But I'll make it quick and painless.

Floats and decimals (aka floating points) aren't kept in the memory precisely either. But I don't want to get into float accuracy vs precision issues here, i'll just point you to a link - http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

As far as your other question, lemme put it this way, since floats are stored in scientific notation.

floatVal = 6.765432E6 = 6.7 * 10^6
MAX_VALUE = 3.4E38 = 3.4 * 10^38

MAX_VALUE is 32 orders of magnitude bigger than your float number. Feel free to take a look at here as well http://steve.hollasch.net/cgindex/coding/ieeefloat.html

I've spent a great deal of time comparing and fixing some FP issues a few months ago...

Try to use a small delta when comparing floats.
Maybe this link will help you http://introcs.cs.princeton.edu/java/91float/

Retain precision with double in Java

As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.

Now, a little explanation into why this is happening:

The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.

More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:

  • 1 bit denotes the sign (positive or negative).
  • 11 bits for the exponent.
  • 52 bits for the significant digits (the fractional part as a binary).

These parts are combined to produce a double representation of a value.

(Source: Wikipedia: Double precision)

For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.

The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.

When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.

As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.

From the Java API Reference for the BigDecimal class:

Immutable,
arbitrary-precision signed decimal
numbers. A BigDecimal consists of an
arbitrary precision integer unscaled
value and a 32-bit integer scale. If
zero or positive, the scale is the
number of digits to the right of the
decimal point. If negative, the
unscaled value of the number is
multiplied by ten to the power of the
negation of the scale. The value of
the number represented by the
BigDecimal is therefore (unscaledValue
× 10^-scale).

There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:

  • Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
  • How to print really big numbers in C++
  • How is floating point stored? When does it matter?
  • Use Float or Decimal for Accounting Application Dollar Amount?

If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.

How to deal with float rounding errors

Inaccurate Method

When you are using numbers that require Precise calculations you need to be sure that you aren't doing something like: (and this is what it seems like you are currently doing)

error accumulation

This will result in the accumulation of rounding errors as the process continues; giving you extremely innacurate data long-term. In the above example, you are actually rounding off the starting float 4 times, each time it becomes more and more inaccurate!



Accurate Method

A better and more accurate way of obtaining numbers is to do this:
avoid accumulation of rounding errors

This will help you to avoid the accumulation of rounding errors because each calculation is based off of only 1 conversion and the results from that conversion are not compounded into the next calculation.

The best method of attack would be to start at the highest precision that is necessary, then convert on an as-needed basis, but leave the original intact. I would suggest you to follow the process from the second picture that I posted.

I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).

It's important to note that you should not attempt to perform any type of rounding of your values if there is not noticeable impact on your end result; you will simply be doing more work for little to no gain, and may even suffer a performance decrease if done often enough.

Displayed precision of Java floating-point

The number of digits you see when a float or a double is printed is a consequence of Java’s rules for default conversion of float and double to decimal.

Java’s default formatting for floating-point numbers uses the fewest significant decimal digits needed to distinguish the number from nearby representable numbers.1

In your example, 1.2345678990922222f in source text is converted to the float value 1.2345678806304931640625, because, of all the values representable in the float type, that one is closest to 1.2345678990922222. The next lower and next higher values are 1.23456776142120361328125 and 1.23456799983978271484375.

When printing this value, Java only needs to print “1.2345679”, because that is enough that we can pick out the float value 1.2345678806304931640625 from its neighbors 1.23456776142120361328125 and 1.23456799983978271484375.

For your double example, 1.22222222222222222222d is converted to 1.22222222222222232090871330001391470432281494140625. The next lower and next higher values representable in double are 1.2222222222222220988641083749826066195964813232421875 and 1.2222222222222225429533182250452227890491485595703125. As you can see, to distinguish 1.22222222222222232090871330001391470432281494140625 from its neighbors, Java needs to print “1.2222222222222223”.

Footnote

1 The rule for Java SE 10 can be found in the documentation for java.lang.float, in the toString(float d) section. The double documentation is similar. The passage, with the most relevant part in bold, is:

Returns a string representation of the float argument. All characters mentioned below are ASCII characters.

  • If the argument is NaN, the result is the string "NaN".

  • Otherwise, the result is a string that represents the sign and magnitude (absolute value) of the argument. If the sign is negative, the first character of the result is '-' ('\u002D'); if the sign is positive, no sign character appears in the result. As for the magnitude m:

    • If m is infinity, it is represented by the characters "Infinity"; thus, positive infinity produces the result "Infinity" and negative infinity produces the result "-Infinity".

    • If m is zero, it is represented by the characters "0.0"; thus, negative zero produces the result "-0.0" and positive zero produces the result "0.0".

    • If m is greater than or equal to 10-3 but less than 107, then it is represented as the integer part of m, in decimal form with no leading zeroes, followed by '.' ('\u002E'), followed by one or more decimal digits representing the fractional part of m.

    • If m is less than 10-3 or greater than or equal to 107, then it is represented in so-called "computerized scientific notation." Let n be the unique integer such that 10nm < 10n+1; then let a be the mathematically exact quotient of m and 10n so that 1 ≤ a < 10. The magnitude is then represented as the integer part of a, as a single decimal digit, followed by '.' ('\u002E'), followed by decimal digits representing the fractional part of a, followed by the letter 'E' ('\u0045'), followed by a representation of n as a decimal integer, as produced by the method Integer.toString(int).

How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type float. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument f. Then f must be the float value nearest to x; or, if two float values are equally close to x, then f must be one of them and the least significant bit of the significand of f must be 0.

possible loss of precision error in java

Floating point literals are by default of double type. And assigning a double value to a float type will result in some precision error. You can either change the type of c to double, or append an f at the end of it to make it float like so:

float c = 30.123f;

Why do floats seem to add incorrectly in Java?

Floats are an approximation of the actual number in Java, due to the way they're stored. If you need exact values, use a BigDecimal instead.



Related Topics



Leave a reply



Submit