How to Make Sure the Floating Point Arithmetic Result the Same in Both Linux and Windows

Is there any way to make sure the floating point arithmetic result the same in both linux and windows

Use /fp:strict on Windows to tell the compiler to produce code that strictly follows IEEE 754, and gcc -msse2 -mfpmath=sse on Linux to obtain the same behavior there.

The reasons for the differences you are seeing have been discussed in spots on StackOverflow, but the best survey is David Monniaux's article.


The assembly instructions I obtain when compiling with gcc -msse2 -mpfmath=sse are as follow. Instructions cvtsi2ssq, divss, mulss, addss are the correct instructions to use, and they result in a program where p_value contains at one point 42d5d1ec.

    .globl  _main
.align 4, 0x90
_main: ## @main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq $32, %rsp
movl $0, -4(%rbp)
movl $0, -8(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
cmpl $100000, -8(%rbp) ## imm = 0x186A0
jge LBB0_4
## BB#2: ## in Loop: Header=BB0_1 Depth=1
movq _p_value@GOTPCREL(%rip), %rax
movabsq $100, %rcx
cvtsi2ssq %rcx, %xmm0
movss LCPI0_0(%rip), %xmm1
movabsq $10, %rcx
cvtsi2ssq %rcx, %xmm2
cvtsi2ss -8(%rbp), %xmm3
divss %xmm3, %xmm2
movss %xmm2, -12(%rbp)
cvtsi2ss -8(%rbp), %xmm2
mulss %xmm2, %xmm1
addss %xmm0, %xmm1
movss %xmm1, (%rax)
movl (%rax), %edx
movl %edx, -16(%rbp)
leaq L_.str(%rip), %rdi
movl -16(%rbp), %esi
movb $0, %al
callq _printf
movl %eax, -20(%rbp) ## 4-byte Spill
## BB#3: ## in Loop: Header=BB0_1 Depth=1
movl -8(%rbp), %eax
addl $1, %eax
movl %eax, -8(%rbp)
jmp LBB0_1
LBB0_4:
movl -4(%rbp), %eax
addq $32, %rsp
popq %rbp
ret

Wrong double initialization on Windows/MSYS2 compared to Linux

It cannot be expected to get bitwise identical results for floating point functions other than sqrt across systems.

More specifically cos is not guarantueed to be correctly rounded so the result of the run-time library (your second result d) is not necessarily identical to the result of the constant folding happending during compilation (your result c).

For instance the page on GCC FP Math states that during constant folding all operations are correctly rounded (even the transcendental functions), but this is not generally the case for the run-time library due to efficiency reasons. (Also not all compilers would give you this correctly rounded guarantuee for constant folding, so for cross compilers or compilers which use different libm than the generated executable you may see this issue.)

GCC internally uses the MPFR library which implements correctly rounded operations with intermediate arbitrary precision arithmetic, but in the run-time library this would be too slow.

Even for linux you probably can find aguments to the cos function so that there is disagreement between your c and d (unless they are using crlibm under the hood). So the identical of cand d results for linux are coincidence in my opionion.

If you look for more predictable FP math results during run-time you can check out crlibm, which is an efficient implementation of many floating point functions that give correctly rounded results. It is much more efficient than MPFR but not as efficient as the standard libm functions.

Cross Platform Floating Point Consistency

Cross-platform and cross-compiler consistency is of course possible. Anything is possible given enough knowledge and time! But it might be very hard, or very time-consuming, or indeed impractical.

Here are the problems I can foresee, in no particular order:

  1. Remember that even an extremely small error of plus-or-minus 1/10^15 can blow up to become significant (you multiply that number with that error margin with one billion, and now you have a plus-or-minus 0.000001 error which might be significant.) These errors can accumulate over time, over many frames, until you have a desynchronized simulation. Or they can manifest when you compare values (even naively using "epsilons" in floating-point comparisons might not help; only displace or delay the manifestation.)

  2. The above problem is not unique to distributed deterministic simulations (like yours.) The touch on the issue of "numerical stability", which is a difficult and often neglected subject.

  3. Different compiler optimization switches, and different floating-point behavior determination switches might lead to the compiler generate slightly different sequences of CPU instructions for the same statements. Obviously these must be the same across compilations, using the same exact compilers, or the generated code must be rigorously compared and verified.

  4. 32-bit and 64-bit programs (note: I'm saying programs and not CPUs) will probably exhibit slightly different floating-point behaviors. By default, 32-bit programs cannot rely on anything more advanced than x87 instruction set from the CPU (no SSE, SSE2, AVX, etc.) unless you specify this on the compiler command line (or use the intrinsics/inline assembly instructions in your code.) On the other hand, a 64-bit program is guaranteed to run on a CPU with SSE2 support, so the compiler will use those instructions by default (again, unless overridden by the user.) While x87 and SSE2 float datatypes and operations on them are similar, they are - AFAIK - not identical. Which will lead to inconsistencies in the simulation if one program uses one instruction set and another program uses another.

  5. The x87 instruction set includes a "control word" register, which contain flags that control some aspects of floating-point operations (e.g. exact rounding behavior, etc.) This is a runtime thing, and your program can do one set of calculations, then change this register, and after that do the exact same calculations and get a different result. Obviously, this register must be checked and handled and kept identical on the different machines. It is possible for the compiler (or the libraries you use in your program) to generate code that changes these flags at runtime inconsistently across the programs.

  6. Again, in case of the x87 instruction set, Intel and AMD have historically implemented things a little differently. For example, one vendor's CPU might internally do some calculations using more bits (and therefore arrive at a more accurate result) that the other, which means that if you happen to run on two different CPUs (both x86) from two different vendors, the results of simple calculations might not be the same. I don't know how and under what circumstances these higher accuracy calculations are enabled and whether they happen under normal operating conditions or you have to ask for them specifically, but I do know these discrepancies exist.

  7. Random numbers and generating them consistently and deterministically across programs has nothing to do with floating-point consistency. It's important and source of many bugs, but in the end it's just a few more bits of state that you have to keep synched.

And here are a couple of techniques that might help:


  1. Some projects use "fixed-point" numbers and fixed-point arithmetic to avoid rounding errors and general unpredictability of floating-point numbers. Read the Wikipedia article for more information and external links.

  2. In one of my own projects, during development, I used to hash all the relevant state (including a lot of floating-point numbers) in all the instances of the game and send the hash across the network each frame to make sure even one bit of that state wasn't different on different machines. This also helped with debugging, where instead of trusting my eyes to see when and where inconsistencies existed (which wouldn't tell me where they originated, anyways) I would know the instant some part of the state of the game on one machine started diverging from the others, and know exactly what it was (if the hash check failed, I would stop the simulation and start comparing the whole state.)

    This feature was implemented in that codebase from the beginning, and was used only during the development process to help with debugging (because it had performance and memory costs.)

Update (in answer to first comment below): As I said in point 1, and others have said in other answers, that doesn't guarantee anything. If you do that, you might decrease the probability and frequency of an inconsistency occurring, but the likelihood doesn't become zero. If you don't analyze what's happening in your code and the possible sources of problems carefully and systematically, it is still possible to run into errors no matter how much you "round off" your numbers.

For example, if you have two numbers (e.g. as results of two calculations that were supposed to produce identical results) that are 1.111499999 and 1.111500001 and you round them to three decimal places, they become 1.111 and 1.112 respectively. The original numbers' difference was only 2E-9, but it has now become 1E-3. In fact, you have increased your error 500'000 times. And still they are not equal even with the rounding. You've exacerbated the problem.

True, this doesn't happen much, and the examples I gave are two unlucky numbers to get in this situation, but it is still possible to find yourself with these kinds of numbers. And when you do, you're in trouble. The only sure-fire solution, even if you use fixed-point arithmetic or whatever, is to do rigorous and systematic mathematical analysis of all your possible problem areas and prove that they will remain consistent across programs.

Short of that, for us mere mortals, you need to have a water-tight way to monitor the situation and find exactly when and how the slightest discrepancies occur, to be able to solve the problem after the fact (instead of relying on your eyes to see problems in game animation or object movement or physical behavior.)

How to use Floating Point extended precision in a MacOs or Windows system

Simply using extended precision on OS X is easy:

x=11.L*x - 10.L*d;

The L suffix causes the two literals to be long doubles instead of doubles, which forces the entire expression to be evaluated in 80-bit extended per C's expression evaluation rules.

That aside, there seems to be some confusion in your question; you say "... on a Linux the code will run with no problem." A couple points:

  • Both the OS X result and the Linux result conform to IEEE-754 and to the C standard. There is no "problem" with either one of them.
  • The OS X result is reproducible on hardware that does not support the (non-standard) 80-bit floating point type. The Linux result is not.
  • Computations that depend on intermediate results being kept in 80-bit extended are fragile; changing compiler options, optimization settings, or even program flow may cause the result to change. The OS X result will be stable across such changes.

Ultimately, you must keep in mind that floating-point arithmetic is not real arithmetic. The fact that the result obtained on Linux is closer to the result obtained when evaluating the expression with real numbers does not make that approach better (or worse).

For every case where automatic usage of extended precision saved a naive user of floating-point, I can show you a case where the unpredictability of that evaluation mode introduces a subtle and hard-to-diagnose bug. These are commonly called "excess-precision" bugs; one of the most famous recent examples was a bug that allowed users to put 2.2250738585072011e-308 into a web form and crash the server. The ultimate cause is precisely that the compiler going behind the programmer's back and maintaining more precision than it was instructed to. OS X was not affected by this bug because double-precision expressions are evaluated in double-precision, not extended.

People can be educated about the gotchas of floating-point arithmetic, so long as the system is both reproducible and portable. Evaluating double-precision expressions in double and single-precision in single provides those attributes. Using extended-precision evaluation undermines them. You cannot do serious engineering in an environment where your tools are unpredictable.



Related Topics



Leave a reply



Submit