Float and Double Datatype in Java

Float and double datatype in Java

The Wikipedia page on it is a good place to start.

To sum up:

  • float is represented in 32 bits, with 1 sign bit, 8 bits of exponent, and 23 bits of the significand (or what follows from a scientific-notation number: 2.33728*1012; 33728 is the significand).

  • double is represented in 64 bits, with 1 sign bit, 11 bits of exponent, and 52 bits of significand.

By default, Java uses double to represent its floating-point numerals (so a literal 3.14 is typed double). It's also the data type that will give you a much larger number range, so I would strongly encourage its use over float.

There may be certain libraries that actually force your usage of float, but in general - unless you can guarantee that your result will be small enough to fit in float's prescribed range, then it's best to opt with double.

If you require accuracy - for instance, you can't have a decimal value that is inaccurate (like 1/10 + 2/10), or you're doing anything with currency (for example, representing $10.33 in the system), then use a BigDecimal, which can support an arbitrary amount of precision and handle situations like that elegantly.

What is the inclusive range of float and double in Java?

Java's Double class has members containing the Min and Max value for the type.

2^-1074 <= x <= (2-2^-52)·2^1023 // where x is the double.

Check out the Min_VALUE and MAX_VALUE static final members of Double.

(some)People will suggest against using floating point types for things where accuracy and precision are critical because rounding errors can throw off calculations by measurable (small) amounts.

Why different result? float vs double

Why are the results different?

In a general sense:

  • Because the binary representations for float and double are different.
  • Therefore the differences (the errors) between decimal and binary floating point representations are liable to be different in float versus double.
  • When the representation errors are different for the respective numbers, the errors after calculation are liable to be different.

Errors can creep in and/or compound when converting the decimal numbers to binary, when doing the arithmetic, and when converting the binary back to decimal to print out the number. They are inherent / unavoidable to all computations involving Real numbers and finite numeric representations on a practical computer.

For a broader treatment, read: Is floating point math broken?


Now if you were so inclined, you could examine the binary representations for the numbers here, and work out precisely where the errors are occurring here:

  • in the decimal -> binary floating point conversion
  • in the floating point arithmetic
  • in the binary floating point conversion -> decimal conversion,
  • or in more than one of the above.

If you really want to dig into it, I suggest that you take a look at Float.floatToRawBits method and its double analog. These will allow you to examine the binary floating point representations. You can then manually convert them to exact real numbers and work out the errors compared with the "ideal" decimal representations.

It is tedious.

Floating and Double types range in java

The book states the MIN_VALUE and MAX_VALUE for the floating point types. This range describes available precision but it is certainly not the case that all values must fall between MIN_VALUE and MAX_VALUE as you can easily confirm by assigning zero or a negative number to a float variable.

Floating point values (float and double) can be one of the following:

  • NaN (not a number)
  • negative infinity
  • a negative number between -MAX_VALUE and -MIN_VALUE
  • negative zero
  • positive zero
  • a positive number between MIN_VALUE and MAX_VALUE
  • positive infinity

Overflow and Underflow in Java Float and Double Data Types

These "weird" results are not really specific to Java. It's just that floats as defined by the relevant IEEE standard, are much more complicated than most people suspect. But onto your specific results: Float.MIN_VALUE is the smallest positive float, so it's very close to 0. Hence Float.MIN_VALUE - 1 will be very close to -1. But since the float precision around -1 is greater than that difference, it comes out as -1. As to Float.MAX_VALUE, the float precision around this value is much greater than 1 and adding one doesn't change the result.

Retain precision with double in Java

As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.

Now, a little explanation into why this is happening:

The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.

More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:

  • 1 bit denotes the sign (positive or negative).
  • 11 bits for the exponent.
  • 52 bits for the significant digits (the fractional part as a binary).

These parts are combined to produce a double representation of a value.

(Source: Wikipedia: Double precision)

For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.

The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.

When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.

As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.

From the Java API Reference for the BigDecimal class:

Immutable,
arbitrary-precision signed decimal
numbers. A BigDecimal consists of an
arbitrary precision integer unscaled
value and a 32-bit integer scale. If
zero or positive, the scale is the
number of digits to the right of the
decimal point. If negative, the
unscaled value of the number is
multiplied by ten to the power of the
negation of the scale. The value of
the number represented by the
BigDecimal is therefore (unscaledValue
× 10^-scale).

There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:

  • Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
  • How to print really big numbers in C++
  • How is floating point stored? When does it matter?
  • Use Float or Decimal for Accounting Application Dollar Amount?

If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.

float and double datatype is good to store latitude and longitude?

It's not a matter of safety, its just a matter of precision.
I wouldn't consider floats, but doubles are what i think are ideal here.
You just need to see what's the most precision you can get out of a double and see if it fits a regular longitude/latitude value. I think it's more then enough.

Else BigDecimal is just a simple backdoor to your problem, use it if you want more precision



Related Topics



Leave a reply



Submit