Std::Pow Produce Different Result in 32 Bit and 64 Bit Application

acos(double) gives different result on x64 and x32 Visual Studio

TL:DR: this is normal and you can't reasonably change it.


The 32-bit library may be using 80-bit FP values in x87 registers for its temporaries, avoiding rounding off to 64-bit double after every operation. (Unless there's a whole separate library, compiling your own code to use SSE doesn't change what's inside the library, or even the calling convention for passing data to the library. But since 32-bit passes double and float in memory on the stack, a library is free to load it with SSE2 or with x87. Still, you don't get the performance advantage of passing FP values in xmm registers unless it's impossible for non-SSE code to use the library.)

It's also possible that they're different simply because they use a different order of operations, producing different temporaries along the way. That's less plausible, unless they're separately hand-written in asm. If they're built from the same C source (without "unsafe" FP optimizations), then the compiler isn't allowed to reorder things, because of this non-associative behaviour of FP math.


glibc's libm (used on Linux) typically favours precision over speed, so its giving you the correctly-rounded result out to the last bit of the mantissa for both 32 and 64-bit. The IEEE FP standard only requires the basic operations (+ - * / FMA and FP remainder) to be "correctly rounded" out to the last bit of the mantissa. (i.e. rounding error of at most 0.5 ulp). (The exact result, according to calc, is 1.047304076386807714.... Keep in mind that double (on x86 with normal compilers) is IEEE754 binary64, so internally the mantissa and exponent are in base2. If you print enough extra decimal digits, though, you can tell that ...7714 should round up to ...78, although really you should print more digits in case they're not zero beyond that. I'm just assuming it's ...78000.)

So Microsoft's 64-bit library implementation produces 1.0473040763868076 and there's pretty much nothing you can do about it, other than not use it. (e.g. find your own acos() implementation and use it.) But FP determinism is hard, even if you limit yourself to just x86 with SSE. See Does any floating point-intensive code produce bit-exact results in any x86-based architecture?. If you limit yourself to a single compiler, it can be possible if you avoid complicated library functions like acos().

You might be able to get the 32-bit library version to produce the same value as the 64-bit version, if it uses x87 and changing the x87 precision setting affects it. But the other way around is not possible: SSE2 has separate instructions for 64-bit double and 32-bit float, and always rounds after every instruction, so you can't change any setting that will increase precision result. (You could change the SSE rounding mode, and that will change the result, but not in a good way!)

See also:

  • Intermediate Floating-Point Precision and the rest of Bruce Dawson's excellent series of articles about floating point. (table of contents.

    The linked article describes how some versions of VC++'s CRT runtime startup set the x87 FP register precision to 53-bit mantissa instead of 80-bit full precision. Also that D3D9 will set it to 24, so even double only has the precision of float if done with x87.

  • https://en.wikipedia.org/wiki/Rounding#Table-maker.27s_dilemma

  • What Every Computer Scientist Should Know About Floating-Point Arithmetic

Why does Math.Exp give different results between 32-bit and 64-bit, with same input, same hardware

Yes rounding errors, and it is effectively NOT the same hardware. The 32 bit version is targeting a different set of instructions and register sizes.

Why does this calculation give different result in boost::thread and std::thread?

The difference seems to be the fact that the std::thread implementation does an _fpreset(), while boost::thread obviously doesn't. If you change the line

namespace boost { void tss_cleanup_implemented() { } }

to (formatted a little for clarity):

namespace boost 
{
void tss_cleanup_implemented()
{
_fpreset();
}
}

You will see that all values are exactly the same now (3EAABB3194A6E99A). That tells me that Boost doesn't do an _fpreset(). This call is necessary because some Windows API calls mess up the standard FPU settings C++Builder (32 bit) uses and don't set them back to what they were (this is a problem you can encounter in Delphi as well).

both std::thread and boost:thread use Win32 API calls to handle threads.

Something tells me that you expected this already, hence the test with print_number_2() which does an _fpreset().

Difference in floating point arithmetics between x86 and x64

The issue hinges on this expression:

bool bLarger2 = (a*c)<b;

I looked at the code generated under VS2008, not having VS2010 to hand. For 64 bit the code is:


000000013FD51100 movss xmm1,dword ptr [a]
000000013FD51106 mulss xmm1,dword ptr [c]
000000013FD5110C movss xmm0,dword ptr [b]
000000013FD51112 comiss xmm0,xmm1

For 32 bit the code is:


00FC14DC fld dword ptr [a]
00FC14DF fmul dword ptr [c]
00FC14E2 fld dword ptr [b]
00FC14E5 fcompp

So under 32 bit the calculation is performed in the x87 unit, and under 64 bit it is performed by the x64 unit.

And the difference here is that the x87 operations are all performed to higher than single precision. By default the calculations are performed to double precision. On the other hand the SSE unit operations are pure single precision calculations.

You can persuade the 32 bit unit to perform all calculations to single precision accuracy like this:

_controlfp(_PC_24, _MCW_PC);

When you add that to your 32 bit program you will find that the booleans are both set to false.

There is a fundamental difference in the way that the x87 and SSE floating point units work. The x87 unit uses the same instructions for both single and double precision types. Data is loaded into registers in the x87 FPU stack, and those registers are always 10 byte Intel extended. You can control the precision using the floating point control word. But the instructions that the compiler writes are ignorant of that state.

On the other hand, the SSE unit uses different instructions for operations on single and double precision. Which means that the compiler can emit code that is in full control of the precision of the calculation.

So, the x87 unit is the bad guy here. You can maybe try to persuade your compiler to emit SSE instructions even for 32 bit targets. And certainly when I compiled your code under VS2013 I found that both 32 and 64 bit targets emitted SSE instructions.

Differences in rounded result when calling pow()

The result of the pow operation is 25.0000 plus or minus some bit of rounding error. If the rounding error is positive or zero, 25 will result from the conversion to an integer. If the rounding error is negative, 24 will result. Both answers are correct.

What is most likely happening internally is that in one case a higher-precision, 80-bit FPU value is being used directly and in the other case, the result is being written from the FPU to memory (as a 64-bit double) and then read back in (converting it to a slightly different 80-bit value). This can make a microscopic difference in the final result, which is all it takes to change a 25.0000000001 to a 24.999999997

Another possibility is that your compiler recognizes the constants passed to pow and does the calculation itself, substituting the result for the call to pow. Your compiler may use an internal arbitrary-precision math library or it may just use one that's different.

Apparently identical math expressions with different output

FLT_EVAL_METHOD.

C allows intermediate FP calculations to occur at higher/wider types depending on FLT_EVAL_METHOD. So when wider types are used and code flow differs, though mathematically equal, slightly different results may occur.


Except for assignment and cast (which remove all extra range and precision), the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized
by the implementation-defined value of FLT_EVAL_METHOD:

-1. indeterminable;

0. evaluate all operations and constants just to the range and precision of the type;

1. evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;

2. evaluate all operations and constants to the range and precision of the
long double type.

C11dr §5.2.4.2.2 9

[Edit]

@Pascal Cuoq has a useful comment on the veracity on FLT_EVAL_METHOD. In any case, FP code, optimized different along various code paths, may present different results. This may occur when FLT_EVAL_METHOD != 0 or compiler is not strictly conforming.

Concerning a detail of the post: the operation X*Y + Z done in 2 operations of * and then + could be contrasted with fma() which "compute (x × y) + z, rounded as one ternary operation: they compute the value (as if) to infinite precision and round once to the result format, according to the current rounding mode." C11 §7.12.13.1 2. Another candidate for the difference in results could be due to the application "fma" to the line e=(c/d)*(b-a)+a;



Related Topics



Leave a reply



Submit