How Many Significant Digits Do Floats and Doubles Have in Java

How many significant digits should I use for double literals in Java?

A double has 15-17 decimal digits precision.

I can't answer why those constants appear with 20 decimals, but you'll find that those digits are dropped:

2.7182818284590452354d == 2.718281828459045d

Large float and double numbers in java printing/persisting incorrectly. Is this behavior due to number of significant digits?

Java’s Float type (IEEE-754 binary32) effectively has two components:

  • a integer number of units from −16,777,215 to +16,777,215 (224−1) and
  • a unit that is a power of two from 2104 to 2−149.

The smallest unit (within range) that keeps the number of units within range is used.

For example, with 50,000,000,115, we cannot use a unit size of 2048 (212), because 50,000,000,115 is about 24,414,062 units of 2048, which is more than 16,777,215 units. So we use a unit size of 4096.

50,000,000,115 is exactly 12,207,031.278076171875 units of 4096, but we can only use an integer number of units, so the Float value closest to 50,000,000,115 is 12,207,031 units of 4096, which is 49,999,998,976.

The other values in your question are represented similarly, but Java’s rules for formatting numbers with %,f result in limited numbers of decimal digits being used to show the value. So, in some of your examples, we see trailing zeros where the actual mathematical value of the internal number is different.

For Double (IEEE-754 binary64), the two components are:

  • a integer number of units from −9,007,199,254,740,991 to +9,007,199,254,740,991 (253−1) and
  • a unit that is a power of two from 2972 to 2−1074.

Number of significant digits for a floating point type

According to the standard, not all decimal number can be stored exactly in memory. Depending on the size of the representation, the error can get to a certain maximum. For float this is 0.0001% (6 significant digits = 10^-6 = 10^-4 %).

In your case the error is (12345.6 - 12345.599609) / 12345.6 = 3.16e-08 far lower than the maximum error for floats.

Retain precision with double in Java

As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.

Now, a little explanation into why this is happening:

The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.

More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:

  • 1 bit denotes the sign (positive or negative).
  • 11 bits for the exponent.
  • 52 bits for the significant digits (the fractional part as a binary).

These parts are combined to produce a double representation of a value.

(Source: Wikipedia: Double precision)

For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.

The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.

When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.

As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.

From the Java API Reference for the BigDecimal class:

Immutable,
arbitrary-precision signed decimal
numbers. A BigDecimal consists of an
arbitrary precision integer unscaled
value and a 32-bit integer scale. If
zero or positive, the scale is the
number of digits to the right of the
decimal point. If negative, the
unscaled value of the number is
multiplied by ten to the power of the
negation of the scale. The value of
the number represented by the
BigDecimal is therefore (unscaledValue
× 10^-scale).

There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:

  • Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
  • How to print really big numbers in C++
  • How is floating point stored? When does it matter?
  • Use Float or Decimal for Accounting Application Dollar Amount?

If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.

What is the minimum number of significant decimal digits in a floating point literal to represent the value as correct as possible?

What about double or even arbitrary precision, is there a simple formula to derive the number of digits needed?>

From C17 § 5.2.4.2.2 11 FLT_DECIMAL_DIG, DBL_DECIMAL_DIG, LDBL_DECIMAL_DIG

number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,

pmax log10 b: if b is a power of 10

1 + pmax log10 b: otherwise



But I'm interested in the math behind it. How can one be sure that 9 digits is enough in this case?

Each range of binary floating point like [1.0 ... 2.0), [128.0 ... 256.0), [0.125 ... 0.5) contains 2p - 1 values uniformly distributed. e.g. With float, p = 24.

Each range of a decade of decimal text with n significant digits in exponential notation like [1.0 ... 9.999...), [100.0f ... 999.999...), [0.001 ... 0.00999...) contains 10n - 1 values uniformly distributed.

Example: common float:

When p is 24 with 224 combinations, n must at least 8 to form the 16,777,216 combinations to distinctly round-trip float to decimal text to float. As the end-points of two decimal ranges above may exist well within that set of 224, the larger decimal values are spaced out further apart. This necessitates a +1 decimal digit.

Example:

Consider the 2 adjacent float values

10.000009_5367431640625
10.000010_49041748046875

Both convert to 8 significant digits decimal text "10.000010". 8 is not enough.

9 is always enough as we do not need more than 167,772,160 to distinguish 16,777,216 float values.


OP also asks about 8388609.499. (Let us only consider float for simplicity.)

That value is nearly half-way between 2 float values.

8388609.0f  // Nearest lower float value
8388609.499 // OP's constant as code
8388610.0f // Nearest upper float value

OP reports: "You can see 8388609.499 needs more than 9 digits to be most accurately converted to float."

And let us review the title "What is the minimum number of significant decimal digits in a floating point literal*1 to represent the value as correct as possible?"

This new question part emphasizes that the value in question is the value of the source code 8388609.499 and not the floating point constant it becomes in emitted code: 8388608.0f.

If we consider the value to be the value of the floating point constant, only up to 9 significant decimal digits are needed to define the floating point constant 8388608.0f. 8388608.49, as source code is sufficient.

But to get the closest floating point constant based on some number as code yes indeed could take many digits.

Consider the typical smallest float, FLT_TRUE_MIN with the exact decimal value of :

0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125

Half way between that and 0.0 is 0.000..(~39 more zeroes)..0007006..(~ 100 more digits)..15625.

It that last digit was 6 or 4, the closest float would be FLT_TRUE_MIN or 0.0f respectively. So now we have a case where 109 significant digits are "needed" to select between 2 possible float.

To forego us going over the cliffs of insanity, IEEE-758 has already addressed this.

The number of significant decimal digits a translation (compiler) must examine to be compliant with that spec (not necessarily the C spec) is far more limited, even if the extra digits could translate to another FP value.

IIRC, it is in effect FLT_DECIMAL_DIG + 3. So for a common float, as little as 9 + 3 significant decimal digits may be examined.

[Edit]

correct rounding is only guaranteed for the number of decimal digits required plus 3 for the largest supported binary format.


*1 C does not define: floating point literal, but does define floating point constant, so that term is used.

How many decimal Places in A Double (Java)

No.

1.100 and 1.1 are exactly the same value (they are represented exactly the same bit-for-bit in a double).

Therefore you can't ever get that kind of information from a double.

The only thing you can do is to get the minimum number of decimal digits necessary for a decimal number to be parsed into the same double value. And that is as easy as calling Double.toString() and checking how many decimal digits there are.



Related Topics



Leave a reply



Submit