Why Does This Floating-Point Calculation Give Different Results on Different MAChines

Why does this floating-point calculation give different results on different machines?

Here's an interesting bit of the C# specifiction, from section 4.1.6:

Floating-point operations may be
performed with higher precision than
the result type of the operation. For
example, some hardware architectures
support an “extended” or “long double”
floating-point type with greater range
and precision than the double type,
and implicitly perform all
floating-point operations using this
higher precision type. Only at
excessive cost in performance can such
hardware architectures be made to
perform floating-point operations with
less precision, and rather than
require an implementation to forfeit
both performance and precision, C#
allows a higher precision type to be
used for all floating-point
operations. Other than delivering more
precise results, this rarely has any
measurable effects.

It is possible that this is one of the "measurable effects" thanks to that call to Ceiling. Taking the ceiling of a floating point number, as others have noted, magnifies a difference of 0.000000002 by nine orders of magnitude because it turns 15.99999999 into 16 and 16.00000001 into 17. Two numbers that differ slightly before the operation differ massively afterwards; the tiny difference might be accounted for by the fact that different machines can have more or less "extra precision" in their floating point operations.

Some related issues:

  • C# XNA Visual Studio: Difference between "release" and "debug" modes?

  • CLR JIT optimizations violates causality?

To address your specific problem of how to compute an aspect ratio from a float: I'd possibly solve this a completely different way. I'd make a table like this:

struct Ratio
{
public int X { get; private set; }
public int Y { get; private set; }
public Ratio (int x, int y) : this()
{
this.X = x;
this.Y = y;
}
public double AsDouble() { return (double)X / (double)Y; }
}

Ratio[] commonRatios = {
new Ratio(16, 9),
new Ratio(4, 3),
// ... and so on, maybe the few hundred most common ratios here.
// since you are pinning results to be less than 20, there cannot possibly
// be more than a few hundred.
};

and now your implementation is

public string AspectRatioAsString(double ratio)      
{
var results = from commonRatio in commonRatios
select new {
Ratio = commonRatio,
Diff = Math.Abs(ratio - commonRatio.AsDouble())};

var smallestResult = results.Min(x=>x.Diff);

return String.Format("{0}:{1}", smallestResult.Ratio.X, smallestResult.Ratio.Y);
}

Notice how the code now reads very much like the operation you are trying to perform: from this list of common ratios, choose the one where the difference between the given ratio and the common ratio is minimized.

Why this same code produce two different fp results on different Machines?

Even without -Ofast, the C++ standard does not require implementations to be exact with log (or sin, or exp, etc.), only that they be within a few ulp (i.e. there may be some inaccuracies in the last binary places). This allows faster hardware (or software) approximations, which each platform/compiler may do differently.

(The only floating point math function that you will always get perfect results from on all platforms is sqrt.)

More annoyingly, you may even get different results between compilation (the compiler may use some internal library to be as precise as float/double allows for constant expressions) and runtime (e.g. hardware-supported approximations).

If you want log to give the exact same result across platforms and compilers, you will have to implement it yourself using only +, -, *, / and sqrt (or find a library with this guarantee). And avoid a whole host of pitfalls along the way.

If you need floating point determinism in general, I strongly recommend reading this article to understand how big of a problem you have ahead of you: https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

Same code using floats on two computers gives two different results

This is not uncommon and it will depend on your compiler, optimisation settings, math libraries, CPU, and of course the numerical stability of the algorithms that you are using.

You need to have a good idea of your accuracy requirements and if you are not meeting these then you may need to look at your algorithms and e.g. consider using double rather than float where needed.

Floating point math in python / numpy not reproducible across machines

Floating point calculations are not always reproducible.

You may get reproducible results for floating calculations across different machines if you use the same executable image, inputs, libraries built with the same compiler and identical compiler settings (switches).

However if you use a dynamically linked library you may get different results, because of numerous reasons. First of all, as Veedrac pointed in comments it might use different algorithms for its routines on different architectures. Second, a compiler might produce different code depending on switches (various optimizations, control settings). Even a+b+c yields non-deterministic results across machines and compilers, because we can not be sure about order of evaluation, precision in intermediate calculations.

Read here why it is not guaranteed to get identical results on different IEEE 754-1985 implementations. New standard (IEEE 754-2008) tries to go further, but it still doesn't guarantee identical results among different implementations, because for example it allows implementers to choose when tinyness (underflow exception) is detected

More information about floating point determinism can be found in this article.

Floating point calculation gives different results with float than with double

What's going on? Why isn't the integer result 18 in both cases?

The problem is that the result of the floating point expression is rounded towards zero when being converted to an integer value (in both cases).

0.1 can't be represented exactly as a floating point value (in both cases). The compiler does the conversion to a binary IEEE754 floating point number and decides whether to round up or down to a representable value. The processor then multiplies this value during runtime and the result is rounded to get an integer value.

Ok, but since both double and float behave like that, why do I get 18 in one of the two cases, but 17 in the other case? I'm confused.

Your code takes the result of the function, 0.1f (a float), and then calculates 20 * (1.0 - 0.1f) which is a double expression, while 20 * (1.0f - 0.1f) is a float expression. Now the float version happens to be slightly larger than 18.0 and gets rounded down to 18, while the double expression is slightly less than 18.0 and gets rounded down to 17.

If you don't know exactly how IEEE754 binary floating point numbers are constructed from decimal numbers, it's pretty much random if it will be slightly less or slightly greater than the decimal number you've entered in your code. So you shouldn't count on this. Don't try to fix such an issue by appending f to one of the numbers and say "now it works, so I leave this f there", because another value behaves differently again.

Why depends the type of the expression on the precence of this f?

This is because a floating point literal in C and C++ is of type double per default. If you add the f, it's a float. The result of a floating point epxression is of the "greater" type. The result of a double expression and an integer is still a double expression as well as int and float will be a float. So the result of your expression is either a float or a double.

Ok, but I don't want to round to zero. I want to round to the nearest number.

To fix this issue, add one half to the result before converting it to an integer:

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()) + 0.5);

In C++11, there is std::round() for that. In previous versions of the standard, there was no such function to round to the nearest integer. (Please see comments for details.)

If you don't have std::round, you can write it yourself. Take care when dealing with negative numbers. When converting to an integer, the number will be truncated (rounded towards zero), which means that negative values will be rounded up, not down. So we have to subtract one half if the number is negative:

int round(double x) {
return (x < 0.0) ? (x - .5) : (x + .5);
}

BigFloat calculations produce different results in various machines

It's maybe could be the culture of this machines.

Explicit the culture in your source code and try again.

CultureInfo.CurrentCulture = CultureInfo.GetCultureInfo("en-US"); // your culture

Why Matlab and Octave give out slightly different result for same equation? How can I fix that?

Floating-point calculations are inherently imprecise. Changing the order of operations will often cause rounding errors to change, which you will see in the last digit (if you are lucky, if you are unlucky the differences will be much larger!). You cannot expect two different programs, or the same program running on two different computers, to generate the exact same floating-point values.

If the difference between these two numbers is a problem to your computations, you should probably find out why this difference gets amplified, and change the order of your computations so that rounding errors do not cause this much harm.


Two additional suggestions:

  • Don't redefine pi. It is a built-in function, when you assign to it you overwrite it.

  • Use 1e-7, not 10^(-7). It is more readable and easier to type.

Why would the same code yield different numeric results on 32 vs 64-bit machines?

You are encountering what is often called the 'x87 excess-precision "bug"'.

In short: historically, (nearly) all floating-point computation on x86 processors was done using the x87 instruction set, which by default operates on an 80-bit floating-point type, but can be set to operate in either single- or double-precision (almost) by some bits in a control register.

If single-precision operations are performed while the precision of the x87 control register is set to double- or extended-precision, then the results will differ from what would be produced if the same operations were performed in single-precision (unless the compiler is extraordinarily careful and stores the result of every computation and reloads it to force rounding to occur in the correct place.)

Your code running on 32-bit is using the x87 unit for floating-point computation (apparently with the control register set for double-precision), and thus encountering the issue described above. Your code running on 64-bit is using the SSE[2,3,...] instructions for floating-point computation, which provide native single- and double-precision operations, and therefore does not carry excess-precision. This is why your results differ.

You can work around this (to a point) by telling your compiler to use SSE for floating-point computation even on 32-bit (-mfpmath=sse with GCC). Even then, bit-exact results are not guaranteed because the various libraries that you link against may use x87, or simply use different algorithms depending on the architecture.



Related Topics



Leave a reply



Submit