Floating Point Comparison Revisited

Floating-point comparison of constant assignment

Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:

If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match (a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.


(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.

If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.

For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.

Why does comparison of floating-point to infinity work?

As anyone knowledgeable enough will know, you cannot compare two floating-point numbers with simple logic operators and expect a logical result.

This has no basis in the IEEE 754 standard or any other specification of floating-point behavior I am aware of. It is an unfortunately common misstatement of floating-point arithmetic.

The fact is that comparison for equality is a perfect operation in floating-point: It produces true if and only if the two operands represent the same number. There is never any error in a comparison for equality.

Another misstatement is that floating-point numbers approximate real numbers. Per IEEE 754, each floating-point value other than NaN represents one number, and it represents that number exactly.

The fact is that floating-point numbers are exact while floating-point operations approximate real arithmetic; correctly-rounded operations produce the nearest representable value (nearest in any direction or in a selected direction, with various rules for ties).

This distinction is critical for understanding, analyzing, designing, and writing proofs about floating-point arithmetic.

Why then, does a logical EQUAL TO comparison to INFINITY always return true when the number is, in fact, INFINITY?

As stated above, comparison for equality produces true if and only if its operands represent the same number. If x is infinity, then x == INFINITY returns true. If x is three, then x == 3 returns true.

People sometimes run into trouble when they do not understand what value is in a number. For example, in float x = 3.3;, people sometimes do not realize that C converts the double 3.3 to float, and therefore x does not contain the same value as 3.3. This is because the conversion operation approximates its results, not because the value of x is anything other than its specific assigned value.

I tried the same comparison with NAN,…

A NaN is Not a Number, so, in a comparison for equality, it never satisfies “the two operands represent the same number”, so the comparison produces false.

.NET - Floating Point Comparison

Whether you believe it or not, this is intended behaviour, and conforms to some IEEE standard.

Its not possible to represent an analogue every-day value such as a massive number or a small fraction with complete fidelity in a single binary representation. The floating point numbers in .NET, such as float or double do their best to minimize error when you assign numbers to them, so when you assigned 0.2 to the variable, the language did its best to choose the representation with the smallest error.

Its not that the number somehow degrades in memory - this is a deliberate step. If you are comparing floating point numbers, you should always allow a region either side of your comparison that is acceptable. Your representation of 0.2 is close to a very large number of decimal places. Is this good enough for your application? It looks glaring to your eyes, but actually is a very small error. When comparing doubles and floats, (to integers or to each other), you should always consider what is the acceptable precision, and accept a range either side of your expected result.

You can also choose to use other types, like decimal that has extremely good precision on decimal places - but is also very large compared to floats and doubles.

efficient floating point comparison

This looks like a contrived way of attempting to circumvent the "should never compare equality on floating point" rule. Comparing inequality is not very much different to comparing equality as you are implicity relying on floating point precision in both cases. Your final 'else' statement is an implicit A == B.

The normal idiom is if (::fabs(A - B) < e) where e is some tolerance, although in your case you don't need the ::fabs.

If you want different results for positive, negative and equality (within limits of computational precision), then do something like

if (A - B > e){
return 0;
} else if (A - B < -e){
return 1;
} else {
return -1;
}

The best you can hope for is setting e to std::numeric_limits<double>::epsilon(). The actual value depends on the number of computational steps executed in order to arrive at A and B. 1e-08 is probably realistic.

As for speed, it is what it is unfortunately: I can't see this being either the bottleneck or running any faster.

Comparison to 0.0 with floating point values

It's perfectly correct in your case to use floating point equality == 0.0.

It perfectly fits the intention of the function (return some value or 0.0 if it fails). Using any other epsilon is somehow arbitrary and require the knowledge of the range of correct values. If ever something went to change that could well be the range of values rather than 0, so testing == 0.0 is not less future proof than other solutions IMO.

The only problem I see is that some compilers will warn about suspiscious usage of equality (-Wfloat-equal)... That's as usefull as warning about int a,b,c; ...; c=a+b; because such instruction might possibly lead to problem (integer overflow and undefined behaviour). Curiously, I never saw the second warning.

So if you want to make usage of -Wall -Werror compiler options future proof, you might encode failure differently (with a negative value for example) and test for foo < 0.0 - until someone discover that floating point inequality might require a tolerance too and declare the construct as suspiscious.

Modern practice to compare double/float for equality in modern C++

Is this code with modern C++11/14/17/21 is still the way we should compare float and doubles, or now it's ok just to write if (double1 == double2) And compiler will handle the epsilon issue for us?

Both approaches function the same in modern C++ as they did in early C++.

Both approaches are also flawed.

  • Using == assumes that your code has accounted for any floating point rounding errors, and it's very rare/difficult for code to do that.

  • Comparing against epsilon assumes that a reasonable amount of rounding error will be less than the constant epsilon, and that is very likely a wrong assumption!

    • If your numbers have magnitude greater than 2.0, your epsilon trick will be no different from direct comparison, and have the same flaws. Regardless of whether you use < or <=.
    • If your numbers have the same sign and a magnitude smaller than epsilon, your epsilon trick will say they are always equal, even if one is hundreds of times larger than the other. They would both be equal to zero, too.

A wise approach may be to avoid writing code that depends on whether floating point numbers are equal. Instead test if they are relatively close, by some factor.

The code below will test whether two numbers are within about 0.01% of each other. Regardless of their scale.

const auto relative_difference_factor = 0.0001.    // 0.01%
const auto greater_magnitude = std::max(std::abs(double1),std::abs(double2));

if ( std::abs(double1-double2) < relative_difference_factor * greater_magnitude )
std::cout<<"Relatively close";
else
std::cout<<"Not relatively close";

Difference in floating point comparison C

If float and double are IEEE-754 32 bit and 64 bit floating point formats respectively, then the closest float to exactly 1.7 is ~1.7000000477, and the closest double is ~1.6999999999999999556. In this case the closest float just happens to be numerically greater than the closest double.



Related Topics



Leave a reply



Submit