Why Are Floating Point Numbers Printed So Differently

Why are floating point numbers printed so differently?

Floating-point numbers are printed differently because printing is done for different purposes, so different choices are made about how to do it.

Printing a floating-point number is a conversion operation: A value encoded in an internal format is converted to a decimal numeral. However, there are choices about the details of the conversion.

(A) If you are doing precise mathematics and want to see the actual value represented by the internal format, then the conversion must be exact: It must produce a decimal numeral that has exactly the same value as the input. (Each floating-point number represents exactly one number. A floating-point number, as defined in the IEEE 754 standard, does not represent an interval.) At times, this may require producing a very large number of digits.

(B) If you do not need the exact value but do need to convert back and forth between the internal format and decimal, then you need to convert it to a decimal numeral precisely (and accurately) enough to distinguish it from any other result. That is, you must produce enough digits that the result is different from what you would get by converting numbers that are adjacent in the internal format. This may require producing a large number of digits, but not so many as to be unmanageable.

(C) If you only want to give the reader a sense of the number, and do not need to produce the exact value in order for your application to function as desired, then you only need to produce as many digits as are needed for your particular application.

Which of these should a conversion do?

Different languages have different defaults because they were developed for different purposes, or because it was not expedient during development to do all the work necessary to produce exact results, or for various other reasons.

(A) requires careful code, and some languages or implementations of them do not provide, or do not guarantee to provide, this behavior.

(B) is required by Java, I believe. However, as we saw in a recent question, it can have some unexpected behavior. (65.12 is printed as “65.12” because the latter has enough digits to distinguish it from nearby values, but 65.12-2 is printed as “63.120000000000005” because there is another floating-point value between it and 63.12, so you need the extra digits to distinguish them.)

(C) is what some languages use by default. It is, in essence, wrong, since no single value for how many digits to print can be suitable for all applications. Indeed, we have seen over decades that it fosters continuing misconceptions about floating-point, largely by concealing the true values involved. It is, however, easy to implement, and hence is attractive to some implementors. Ideally, a language should by default print the correct value of a floating-point number. If fewer digits are to be displayed, the number of digits should be selected only by the application implementor, hopefully including consideration of the appropriate number of digits to produce the desire results.

Worse, some languages, in addition to not displaying the actual value or enough digits to distinguish it, do not even guarantee that the digits produced are correct in some sense (such as being the value you would get by rounding the exact value to the number of digits shown). When programming in an implementation that does not provide a guarantee about this behavior, you are not doing engineering.

Is floating point math broken?

Binary floating point math is like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

  • 0.1000000000000000055511151231257827021181583404541015625 in decimal, or
  • 0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

  • 0.1 in decimal, or
  • 0x1.99999999999999...p-4 in an analogue of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system, but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation, since 5/7 can't be represented exactly with any decimal number).

So no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Side Note: Working with Floats in Programming

In practice, this problem of precision means you need to use rounding functions to round your floating point numbers off to however many decimal places you're interested in before you display them.

You also need to replace equality tests with comparisons that allow some amount of tolerance, which means:

Do not do if (x == y) { ... }

Instead do if (abs(x - y) < myToleranceValue) { ... }.

where abs is the absolute value. myToleranceValue needs to be chosen for your particular application - and it will have a lot to do with how much "wiggle room" you are prepared to allow, and what the largest number you are going to be comparing may be (due to loss of precision issues). Beware of "epsilon" style constants in your language of choice. These are not to be used as tolerance values.

Why does C print float values after the decimal point different from the input value?

Your computer uses binary floating point internally. Type float has 24 bits of precision, which translates to approximately 7 decimal digits of precision.

Your number, 2118850.132, has 10 decimal digits of precision. So right away we can see that it probably won't be possible to represent this number exactly as a float.

Furthermore, due to the properties of binary numbers, no decimal fraction that ends in 1, 2, 3, 4, 6, 7, 8, or 9 (that is, numbers like 0.1 or 0.2 or 0.132) can be exactly represented in binary. So those numbers are always going to experience some conversion or roundoff error.

When you enter the number 2118850.132 as a float, it is converted internally into the binary fraction 1000000101010011000010.01. That's equivalent to the decimal fraction 2118850.25. So that's why the .132 seems to get converted to 0.25.

As I mentioned, float has only 24 bits of precision. You'll notice that 1000000101010011000010.01 is exactly 24 bits long. So we can't, for example, get closer to your original number by using something like 1000000101010011000010.001, which would be equivalent to 2118850.125, which would be closer to your 2118850.132. No, the next lower 24-bit fraction is 1000000101010011000010.00 which is equivalent to 2118850.00, and the next higher one is 1000000101010011000010.10 which is equivalent to 2118850.50, and both of those are farther away from your 2118850.132. So 2118850.25 is as close as you can get with a float.

If you used type double you could get closer. Type double has 53 bits of precision, which translates to approximately 16 decimal digits. But you still have the problem that .132 ends in 2 and so can never be exactly represented in binary. As type double, your number would be represented internally as the binary number 1000000101010011000010.0010000111001010110000001000010 (note 53 bits), which is equivalent to 2118850.132000000216066837310791015625, which is much closer to your 2118850.132, but is still not exact. (Also notice that 2118850.132000000216066837310791015625 begins to diverge from your 2118850.1320000000 after 16 digits.)

So how do you avoid this? At one level, you can't. It's a fundamental limitation of finite-precision floating-point numbers that they cannot represent all real numbers with perfect accuracy. Also, the fact that computers typically use binary floating-point internally means that they can almost never represent "exact-looking" decimal fractions like .132 exactly.

There are two things you can do:

  1. If you need more than about 7 digits worth of precision, definitely use type double, don't try to use type float.
  2. If you believe your data is accurate to three places past the decimal, print it out using %.3f. If you take 2118850.132 as a double, and printf it using %.3f, you'll get 2118850.132, like you want. (But if you printed it with %.12f, you'd get the misleading 2118850.132000000216.)

Why is the same floating-point constant printed differently in Fortran and in C?

By default, Fortran's REAL number constants are single-precision; In C, however, floating point literals have double precision.

When you translate 0.8804418 to single precision and then print it as a double in C, you get 0.8804417849 printed (demo)

float x = 0.8804418f;
printf("%.10g\n", x);

Fortran's printout appears to be the same number rounded up.

Fortran's syntax for double-precision REAL numbers uses suffix d:

print *, 0.8804418d+0

This prints 0.88044180000000005 (demo).

Why is the last number (1) printed?

Because not only float math is flawed, sometimes its representation is flawed too - and that's the case here.

You don't actually get 0.1, 0.2, ... - and that's quite easy to check:

$start = 0;
$stop = 1;
$step = ($stop - $start)/10;
$i = $start + $step;
while ($i < $stop) {
print(number_format($i, 32) . "<br />");
$i += $step;
}

The only difference here, as you see, is that echo replaced with number_format call. But the results are drastically different:

0.10000000000000000555111512312578
0.20000000000000001110223024625157
0.30000000000000004440892098500626
0.40000000000000002220446049250313
0.50000000000000000000000000000000
0.59999999999999997779553950749687
0.69999999999999995559107901499374
0.79999999999999993338661852249061
0.89999999999999991118215802998748
0.99999999999999988897769753748435

See? Only one time it was 0.5 actually - because that number can be stored in a float container. All the others were only approximations.

How to solve this? Well, one radical approach is using not floats, but integers in similar situations. It's easy to notice that have you done it this way...

$start = 0;
$stop = 10;
$step = (int)(($stop - $start) / 10);
$i = $start + $step;
while ($i < $stop) {
print(number_format($i, 32) . "<br />");
$i += $step;
}

... it would work ok:

Alternatively, you can use number_format to convert the float into some string, then compare this string with preformatted float. Like this:

$start = 0;
$stop = 1;
$step = ($stop - $start) / 10;
$i = $start + $step;
while (number_format($i, 1) !== number_format($stop, 1)) {
print(number_format($i, 32) . "\n");
$i += $step;
}

Why are doubles printed differently in dictionaries?

As already mentioned in the comments, a Double cannot store
the value 1.1 exactly. Swift uses (like many other languages)
binary floating point numbers according to the IEEE 754
standard.

The closest number to 1.1 that can be represented as a Double is

1.100000000000000088817841970012523233890533447265625

and the closest number to 2.3 that can be represented as a Double is

2.29999999999999982236431605997495353221893310546875

Printing that number means that it is converted to a string with
a decimal representation again, and that is done with different
precision, depending on how you print the number.

From the source code at HashedCollections.swift.gyb one can see that the description method of
Dictionary uses debugPrint() for both keys and values,
and debugPrint(x) prints the value of x.debugDescription
(if x conforms to CustomDebugStringConvertible).

On the other hand, print(x) calls x.description if x conforms
to CustomStringConvertible.

So what you see is the different output of description
and debugDescription of Double:

print(1.1.description) // 1.1
print(1.1.debugDescription) // 1.1000000000000001

From the Swift source code one can see
that both use the swift_floatingPointToString()
function in Stubs.cpp, with the Debug parameter set to false and true, respectively.
This parameter controls the precision of the number to string conversion:

int Precision = std::numeric_limits<T>::digits10;
if (Debug) {
Precision = std::numeric_limits<T>::max_digits10;
}

For the meaning of those constants, see std::numeric_limits:

  • digits10 – number of decimal digits that can be represented without change,
  • max_digits10 – number of decimal digits necessary to differentiate all values of this type.

So description creates a string with less decimal digits. That
string can be converted to a Double and back to a string giving
the same result.
debugDescription creates a string with more decimal digits, so that
any two different floating point values will produce a different output.

Why do floating point operations display different in some languages?

@ouah establishes that the languages are all behaving the same. My answer aims to explain why they appear different. The only two languages that have "different" output are C and Python.

Clearly, every language besides C and Python is just printing out the float value to as many decimal places as it can.

C is easy to explain. You use printf("%f", result), without specifying an explicit precision value. Per the C standard, the precision of the f specifier defaults to 6. Thus, exactly six decimal places are printed out, which is what you see. As @ouah notes, setting the precision to 18 will yield the "expected" output. This is lossy: doubles that differ past the 7th decimal place will be printed out identically, and so the output of %f cannot be relied on to exactly reconstruct the original float.

Python is a bit trickier. Python 3.1 introduced a new floating-point repr algorithm, based on work by David Gay. The Python issue corresponding to the feature is here: http://bugs.python.org/issue1580. This feature was backported to Python 2.7 as well.

The intention of this new feature was to both reduce confusion over floating point (though that is dubiously useful), and more importantly to provide more human-readable, shorter representations of floating point numbers without affecting round-trip behaviour; that is, float(repr(x)) is always equal to x, even if repr(x) is shortened due to this algorithm. So, the algorithm manages to produce a shorter floating-point representation while remaining "lossless": win-win!

The official description says this much:

The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.



Related Topics



Leave a reply



Submit