What Is a Simple Example of Floating Point/Rounding Error

avoid rounding error (floating specifically) c++

The canonical advice for this topic is to read "What Every Computer Scientist Should Know About Floating-Point Arithmetic", by David Goldberg.

How to deal with float rounding errors

Inaccurate Method

When you are using numbers that require Precise calculations you need to be sure that you aren't doing something like: (and this is what it seems like you are currently doing)

error accumulation

This will result in the accumulation of rounding errors as the process continues; giving you extremely innacurate data long-term. In the above example, you are actually rounding off the starting float 4 times, each time it becomes more and more inaccurate!



Accurate Method

A better and more accurate way of obtaining numbers is to do this:
avoid accumulation of rounding errors

This will help you to avoid the accumulation of rounding errors because each calculation is based off of only 1 conversion and the results from that conversion are not compounded into the next calculation.

The best method of attack would be to start at the highest precision that is necessary, then convert on an as-needed basis, but leave the original intact. I would suggest you to follow the process from the second picture that I posted.

I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).

It's important to note that you should not attempt to perform any type of rounding of your values if there is not noticeable impact on your end result; you will simply be doing more work for little to no gain, and may even suffer a performance decrease if done often enough.

Floating point and the avoidance of rounding errors

My guess is that they probably just got lucky for the particular example you happened to choose. For example, the statement is true in IEEE754 binary32 arithmetic:

>>> import numpy as np
>>> np.float32(0.1) + np.float32(0.1) + np.float32(0.1) == np.float32(0.3)
True

Based this posting, the Apple II didn't provide hardware floating point, so the exact details were dependant on whatever the software provided (and it sounds as though different software provided different implementations). If they happened to use the same 24-bit significand (or another that gave similar results), then you would see the same answer.

UPDATE: this document seems to indicate that Applesoft Basic did use a 24-bit significand (not 25 – 24 plus an implicit 1 – as the earlier link seemed to suggest), which would explain why you saw the same result as binary32 arithmetic.

Find a real life example of floating point error

The question makes little sense because that isn't how floating point errors work.

Inaccuracies are tiny. They happen in remote decimals and they're only noticeable when you require very high precision levels. After all, IEEE 754 powers a vast majority of computer systems and it offers an excellent precision. To put it in context, 0.1 kilometres expressed as float is 0.100000001490116119384765625, what makes accurate up to 1/10 of a µm
if I didn't get maths wrong.

There probably isn't a set of carefully chosen figures and a real-life calculation you'd be expected to use PHP for (an invoice, a stock exchange index...) that renders incorrect results no matter how careful you are with precision levels. Because that's not the problem.

The problem with floating point maths is that it forces you to be extremely careful on every step and it makes it very easy for bugs to slip in.

For applications where accuracy matters, you can write correct software using floats, but it won't be as easy, maintainable or robust.


Original answer:

This is the best I've got so far (thanks to chtz for the hint):

// Set-up and display settings (shouldn't affect internal calculations or final result)
set_time_limit(0);
ini_set('precision', -1);

// Expected accuracy: 2 decimal positions
$total = 0;
for ($i = 0; $i < 1e9; $i++) {
$total += 0.01;
// It's important to NOT round inside the loop, e.g.: $total = round($total + 0.01, 2);
}
var_dump($total, number_format($total, 2));
float(9999999.825158669)
string(12) "9,999,999.83" // Correct value would be "10,000,000.00"

Unfortunately, it relies on the accumulation of a very large number of precision errors (it needs around 1,000,000,000 of them to happen and it needs more than 4 minutes to run in my PC), so it isn't as real-life as I would have liked, but it certainly illustrates the underlying issue.

Real life example fo Floating Point error

These examples are from the embedded world (Ariane 5, Patriot) but are not floating-point rounding errors stricto sensu. The Ariane 5 bug is a bug in a conversion. The Patriot bug was introduced during adaptations of the software. It involves computations in different precisions with an inherently unrepresentable constant (which happens to be the innocuous-looking 0.10).

There are two problems I foresee with binary floats for monetary values:

  • decimal values as common as 0.10 cannot be represented exactly.

  • If the precision is too small, what could have been a clean overflow raising an exception becomes a hard-to-track loss of precision.

Note that base-10 floating-point formats have been standardized precisely for monetary values: some currencies are worth 1/1000000 of a dollar, are never exchanged in less than thousands, and the maximum amount you may want to be able to represent is proportionally big, so a scalable representation makes sense. The intent is that the mantissa is large enough for the largest sums with the official resolution.

Floating point inaccuracy examples

There are basically two major pitfalls people stumble in with floating-point numbers.

  1. The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.

    PS> $a = 1; $b = 0.0000000000000000000000001
    PS> Write-Host a=$a b=$b
    a=1 b=1E-25
    PS> $a + $b
    1

    As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.

    (If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)

  2. Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:

    PS> "{0:N50}" -f 0.1
    0.10000000000000000000000000000000000000000000000000

    But you can “amplify” the representation error by repeatedly adding the numbers together:

    PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
    9,99999999999998

    I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.

    Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.

  3. Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.

    Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.

    That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.

Floating-point error mess

You can almost always ignore the floating point "errors" while you're performing calculations - they won't make any difference to the end result unless you really care about the 17th significant digit or so.

You normally only need to worry about rounding when you display those values, for which .toFixed(1) would do perfectly well.

Whatever happens you simply cannot coerce the number 0.6 into exactly that value. The closest IEEE 754 double precision is exactly 0.59999999999999997779553950749686919152736663818359375, which when displayed within typical precision limits in JS is displayed as 0.5999999999999999778

Indeed JS can't even tell that 0.5999999999999999778 !== (e.g) 0.5999999999999999300 since their binary representation is the same.



Related Topics



Leave a reply



Submit