(.1F+.2F==.3F) != (.1F+.2F).Equals(.3F) Why

(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?

The question is confusingly worded. Let's break it down into many smaller questions:

Why is it that one tenth plus two tenths does not always equal three tenths in floating point arithmetic?

Let me give you an analogy. Suppose we have a math system where all numbers are rounded off to exactly five decimal places. Suppose you say:

x = 1.00000 / 3.00000;

You would expect x to be 0.33333, right? Because that is the closest number in our system to the real answer. Now suppose you said

y = 2.00000 / 3.00000;

You'd expect y to be 0.66667, right? Because again, that is the closest number in our system to the real answer. 0.66666 is farther from two thirds than 0.66667 is.

Notice that in the first case we rounded down and in the second case we rounded up.

Now when we say

q = x + x + x + x;
r = y + x + x;
s = y + y;

what do we get? If we did exact arithmetic then each of these would obviously be four thirds and they would all be equal. But they are not equal. Even though 1.33333 is the closest number in our system to four thirds, only r has that value.

q is 1.33332 -- because x was a little bit small, every addition accumulated that error and the end result is quite a bit too small. Similarly, s is too big; it is 1.33334, because y was a little bit too big. r gets the right answer because the too-big-ness of y is cancelled out by the too-small-ness of x and the result ends up correct.

Does the number of places of precision have an effect on the magnitude and direction of the error?

Yes; more precision makes the magnitude of the error smaller, but can change whether a calculation accrues a loss or a gain due to the error. For example:

b = 4.00000 / 7.00000;

b would be 0.57143, which rounds up from the true value of 0.571428571... Had we gone to eight places that would be 0.57142857, which has far, far smaller magnitude of error but in the opposite direction; it rounded down.

Because changing the precision can change whether an error is a gain or a loss in each individual calculation, this can change whether a given aggregate calculation's errors reinforce each other or cancel each other out. The net result is that sometimes a lower-precision computation is closer to the "true" result than a higher-precision computation because in the lower-precision computation you get lucky and the errors are in different directions.

We would expect that doing a calculation in higher precision always gives an answer closer to the true answer, but this argument shows otherwise. This explains why sometimes a computation in floats gives the "right" answer but a computation in doubles -- which have twice the precision -- gives the "wrong" answer, correct?

Yes, this is exactly what is happening in your examples, except that instead of five digits of decimal precision we have a certain number of digits of binary precision. Just as one-third cannot be accurately represented in five -- or any finite number -- of decimal digits, 0.1, 0.2 and 0.3 cannot be accurately represented in any finite number of binary digits. Some of those will be rounded up, some of them will be rounded down, and whether or not additions of them increase the error or cancel out the error depends on the specific details of how many binary digits are in each system. That is, changes in precision can change the answer for better or worse. Generally the higher the precision, the closer the answer is to the true answer, but not always.

How can I get accurate decimal arithmetic computations then, if float and double use binary digits?

If you require accurate decimal math then use the decimal type; it uses decimal fractions, not binary fractions. The price you pay is that it is considerably larger and slower. And of course as we've already seen, fractions like one third or four sevenths are not going to be represented accurately. Any fraction that is actually a decimal fraction however will be represented with zero error, up to about 29 significant digits.

OK, I accept that all floating point schemes introduce inaccuracies due to representation error, and that those inaccuracies can sometimes accumulate or cancel each other out based on the number of bits of precision used in the calculation. Do we at least have the guarantee that those inaccuracies will be consistent?

No, you have no such guarantee for floats or doubles. The compiler and the runtime are both permitted to perform floating point calculations in higher precision than is required by the specification. In particular, the compiler and the runtime are permitted to do single-precision (32 bit) arithmetic in 64 bit or 80 bit or 128 bit or whatever bitness greater than 32 they like.

The compiler and the runtime are permitted to do so however they feel like it at the time. They need not be consistent from machine to machine, from run to run, and so on. Since this can only make calculations more accurate this is not considered a bug. It's a feature. A feature that makes it incredibly difficult to write programs that behave predictably, but a feature nevertheless.

So that means that calculations performed at compile time, like the literals 0.1 + 0.2, can give different results than the same calculation performed at runtime with variables?

Yep.

What about comparing the results of 0.1 + 0.2 == 0.3 to (0.1 + 0.2).Equals(0.3)?

Since the first one is computed by the compiler and the second one is computed by the runtime, and I just said that they are permitted to arbitrarily use more precision than required by the specification at their whim, yes, those can give different results. Maybe one of them chooses to do the calculation only in 64 bit precision whereas the other picks 80 bit or 128 bit precision for part or all of the calculation and gets a difference answer.

So hold up a minute here. You're saying not only that 0.1 + 0.2 == 0.3 can be different than (0.1 + 0.2).Equals(0.3). You're saying that 0.1 + 0.2 == 0.3 can be computed to be true or false entirely at the whim of the compiler. It could produce true on Tuesdays and false on Thursdays, it could produce true on one machine and false on another, it could produce both true and false if the expression appeared twice in the same program. This expression can have either value for any reason whatsoever; the compiler is permitted to be completely unreliable here.

Correct.

The way this is usually reported to the C# compiler team is that someone has some expression that produces true when they compile in debug and false when they compile in release mode. That's the most common situation in which this crops up because the debug and release code generation changes register allocation schemes. But the compiler is permitted to do anything it likes with this expression, so long as it chooses true or false. (It cannot, say, produce a compile-time error.)

This is craziness.

Correct.

Who should I blame for this mess?

Not me, that's for darn sure.

Intel decided to make a floating point math chip in which it was far, far more expensive to make consistent results. Small choices in the compiler about what operations to enregister vs what operations to keep on the stack can add up to big differences in results.

How do I ensure consistent results?

Use the decimal type, as I said before. Or do all your math in integers.

I have to use doubles or floats; can I do anything to encourage consistent results?

Yes. If you store any result into any static field, any instance field of a class or array element of type float or double then it is guaranteed to be truncated back to 32 or 64 bit precision. (This guarantee is expressly not made for stores to locals or formal parameters.) Also if you do a runtime cast to (float) or (double) on an expression that is already of that type then the compiler will emit special code that forces the result to truncate as though it had been assigned to a field or array element. (Casts which execute at compile time -- that is, casts on constant expressions -- are not guaranteed to do so.)

To clarify that last point: does the C# language specification make those guarantees?

No. The runtime guarantees that stores into an array or field truncate. The C# specification does not guarantee that an identity cast truncates but the Microsoft implementation has regression tests that ensure that every new version of the compiler has this behaviour.

All the language spec has to say on the subject is that floating point operations may be performed in higher precision at the discretion of the implementation.

Why does these two code variants produce different floating-point results?

You should read my long, long answer about why the same thing happens in C#:

(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?

Summing up: first of all, you only get about seven decimal places of accuracy with float. The correct answer were you do to it with exact arithmetic throughout the entire calculation is about 439702.51239669... so you are getting darn close to the correct answer considering the limitations of a float, in either case.

But that doesn't explain why you are getting different results with what looks like exactly the same calculations. The answer is: the compiler is permitted wide lattitude to make your math more accurate, and apparently you have hit upon two cases where the optimizer takes what is logically the same expression and does not optimize them down to the same code.

Anyway, read my answer regarding C# carefully; everything in there applies to C++ just as well.

Curious Behavior When Doing Addition on Nullable Floats

See comments by @EricLippert.

ANYTHING is permitted to change the result -- let me emphasize that
again ANYTHING WHATSOEVER including phase of the moon is permitted to
change whether floats are computed in 32 bit accuracy or higher
accuracy. The processor is always allowed to for any reason whatsoever
decide to suddenly start doing floating point arithmetic in 80 bits or
128 bits or whatever it chooses so long as it is more than or equal to
32 bit precision. See
(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?
for more details.

Asking what in particular in this case caused the processor to decide
to use higher precision in one case and not in another is a losing
game. It could be anything. If you require accurate computations in
decimal figures then use the aptly named decimal type. If you require
repeatable computations in floats then C# has two mechanisms for
forcing the processor back to 32 bits. (1) explicitly cast to (float)
unnecessarily, or (2) store the result in a float array element or
float field of a reference type.

The behavior here has nothing to do with the Nullable type. It's a matter of floats never being exact and being calculated in different precision on the whims of the processor.

In general, this comes down to the advice that if accuracy is important, your best bet is to use something other than float (or use the techniques described by @EricLippert to force the processor to use 32 bit precision).

The answer from Eric Lippert on linked question is also helpful in understanding what's going on.

Which numeric type conversion is better for simple math operation?

EDIT

In response to your totally changed question:

The first line double double1 = integer1 / (5 * integer2); does an integer division, so don't do that.

Also the line var double8 = Convert.ToDouble(integer1 / (5 * integer2)); is doing integer division before converting the result to a double, so don't do that either.

Other than that, all the different approaches you list will end up calling the IL instruction Conv.R8 once for each line in your sample code.

The only real difference is that Convert.ToDouble() will make a method call to do so, so you should avoid that.

The results for every line other than double1 and double8 will be identical.

So you should probably go for the simplest: var double2 = integer1 / (5.0 * integer2);

In a more complicated situation, time your code to see if there's any differences.

Understanding the given calculation (cast + multiplication)

That is exactly the case. It's just that 10.9 is slightly less than 10.9 due to the way floating point numbers are represented (10.9 cannot be represented exactly, so you get an approximation, which in this case is something like 10.89999999...). The cast then truncates any digits following the decimal point. So you get 108.

Exact values are as follows here (obtained with Jon Skeet's DoubleConverter class):

10.9         -> 10.9000000000000003552713678800500929355621337890625
(float) 10.9 -> 10.8999996185302734375

Multiplying that float by 10 and then cutting off all decimal places will obviously result in 108.

(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?

The question is confusingly worded. Let's break it down into many smaller questions:

Why is it that one tenth plus two tenths does not always equal three tenths in floating point arithmetic?

Let me give you an analogy. Suppose we have a math system where all numbers are rounded off to exactly five decimal places. Suppose you say:

x = 1.00000 / 3.00000;

You would expect x to be 0.33333, right? Because that is the closest number in our system to the real answer. Now suppose you said

y = 2.00000 / 3.00000;

You'd expect y to be 0.66667, right? Because again, that is the closest number in our system to the real answer. 0.66666 is farther from two thirds than 0.66667 is.

Notice that in the first case we rounded down and in the second case we rounded up.

Now when we say

q = x + x + x + x;
r = y + x + x;
s = y + y;

what do we get? If we did exact arithmetic then each of these would obviously be four thirds and they would all be equal. But they are not equal. Even though 1.33333 is the closest number in our system to four thirds, only r has that value.

q is 1.33332 -- because x was a little bit small, every addition accumulated that error and the end result is quite a bit too small. Similarly, s is too big; it is 1.33334, because y was a little bit too big. r gets the right answer because the too-big-ness of y is cancelled out by the too-small-ness of x and the result ends up correct.

Does the number of places of precision have an effect on the magnitude and direction of the error?

Yes; more precision makes the magnitude of the error smaller, but can change whether a calculation accrues a loss or a gain due to the error. For example:

b = 4.00000 / 7.00000;

b would be 0.57143, which rounds up from the true value of 0.571428571... Had we gone to eight places that would be 0.57142857, which has far, far smaller magnitude of error but in the opposite direction; it rounded down.

Because changing the precision can change whether an error is a gain or a loss in each individual calculation, this can change whether a given aggregate calculation's errors reinforce each other or cancel each other out. The net result is that sometimes a lower-precision computation is closer to the "true" result than a higher-precision computation because in the lower-precision computation you get lucky and the errors are in different directions.

We would expect that doing a calculation in higher precision always gives an answer closer to the true answer, but this argument shows otherwise. This explains why sometimes a computation in floats gives the "right" answer but a computation in doubles -- which have twice the precision -- gives the "wrong" answer, correct?

Yes, this is exactly what is happening in your examples, except that instead of five digits of decimal precision we have a certain number of digits of binary precision. Just as one-third cannot be accurately represented in five -- or any finite number -- of decimal digits, 0.1, 0.2 and 0.3 cannot be accurately represented in any finite number of binary digits. Some of those will be rounded up, some of them will be rounded down, and whether or not additions of them increase the error or cancel out the error depends on the specific details of how many binary digits are in each system. That is, changes in precision can change the answer for better or worse. Generally the higher the precision, the closer the answer is to the true answer, but not always.

How can I get accurate decimal arithmetic computations then, if float and double use binary digits?

If you require accurate decimal math then use the decimal type; it uses decimal fractions, not binary fractions. The price you pay is that it is considerably larger and slower. And of course as we've already seen, fractions like one third or four sevenths are not going to be represented accurately. Any fraction that is actually a decimal fraction however will be represented with zero error, up to about 29 significant digits.

OK, I accept that all floating point schemes introduce inaccuracies due to representation error, and that those inaccuracies can sometimes accumulate or cancel each other out based on the number of bits of precision used in the calculation. Do we at least have the guarantee that those inaccuracies will be consistent?

No, you have no such guarantee for floats or doubles. The compiler and the runtime are both permitted to perform floating point calculations in higher precision than is required by the specification. In particular, the compiler and the runtime are permitted to do single-precision (32 bit) arithmetic in 64 bit or 80 bit or 128 bit or whatever bitness greater than 32 they like.

The compiler and the runtime are permitted to do so however they feel like it at the time. They need not be consistent from machine to machine, from run to run, and so on. Since this can only make calculations more accurate this is not considered a bug. It's a feature. A feature that makes it incredibly difficult to write programs that behave predictably, but a feature nevertheless.

So that means that calculations performed at compile time, like the literals 0.1 + 0.2, can give different results than the same calculation performed at runtime with variables?

Yep.

What about comparing the results of 0.1 + 0.2 == 0.3 to (0.1 + 0.2).Equals(0.3)?

Since the first one is computed by the compiler and the second one is computed by the runtime, and I just said that they are permitted to arbitrarily use more precision than required by the specification at their whim, yes, those can give different results. Maybe one of them chooses to do the calculation only in 64 bit precision whereas the other picks 80 bit or 128 bit precision for part or all of the calculation and gets a difference answer.

So hold up a minute here. You're saying not only that 0.1 + 0.2 == 0.3 can be different than (0.1 + 0.2).Equals(0.3). You're saying that 0.1 + 0.2 == 0.3 can be computed to be true or false entirely at the whim of the compiler. It could produce true on Tuesdays and false on Thursdays, it could produce true on one machine and false on another, it could produce both true and false if the expression appeared twice in the same program. This expression can have either value for any reason whatsoever; the compiler is permitted to be completely unreliable here.

Correct.

The way this is usually reported to the C# compiler team is that someone has some expression that produces true when they compile in debug and false when they compile in release mode. That's the most common situation in which this crops up because the debug and release code generation changes register allocation schemes. But the compiler is permitted to do anything it likes with this expression, so long as it chooses true or false. (It cannot, say, produce a compile-time error.)

This is craziness.

Correct.

Who should I blame for this mess?

Not me, that's for darn sure.

Intel decided to make a floating point math chip in which it was far, far more expensive to make consistent results. Small choices in the compiler about what operations to enregister vs what operations to keep on the stack can add up to big differences in results.

How do I ensure consistent results?

Use the decimal type, as I said before. Or do all your math in integers.

I have to use doubles or floats; can I do anything to encourage consistent results?

Yes. If you store any result into any static field, any instance field of a class or array element of type float or double then it is guaranteed to be truncated back to 32 or 64 bit precision. (This guarantee is expressly not made for stores to locals or formal parameters.) Also if you do a runtime cast to (float) or (double) on an expression that is already of that type then the compiler will emit special code that forces the result to truncate as though it had been assigned to a field or array element. (Casts which execute at compile time -- that is, casts on constant expressions -- are not guaranteed to do so.)

To clarify that last point: does the C# language specification make those guarantees?

No. The runtime guarantees that stores into an array or field truncate. The C# specification does not guarantee that an identity cast truncates but the Microsoft implementation has regression tests that ensure that every new version of the compiler has this behaviour.

All the language spec has to say on the subject is that floating point operations may be performed in higher precision at the discretion of the implementation.



Related Topics



Leave a reply



Submit