Cross Platform Floating Point Consistency

Cross-platform and cross-compiler consistency is of course possible. Anything is possible given enough knowledge and time! But it might be very hard, or very time-consuming, or indeed impractical.

Here are the problems I can foresee, in no particular order:

Remember that even an extremely small error of plus-or-minus 1/10^15 can blow up to become significant (you multiply that number with that error margin with one billion, and now you have a plus-or-minus 0.000001 error which might be significant.) These errors can accumulate over time, over many frames, until you have a desynchronized simulation. Or they can manifest when you compare values (even naively using "epsilons" in floating-point comparisons might not help; only displace or delay the manifestation.)
The above problem is not unique to distributed deterministic simulations (like yours.) The touch on the issue of "numerical stability", which is a difficult and often neglected subject.
Different compiler optimization switches, and different floating-point behavior determination switches might lead to the compiler generate slightly different sequences of CPU instructions for the same statements. Obviously these must be the same across compilations, using the same exact compilers, or the generated code must be rigorously compared and verified.
32-bit and 64-bit programs (note: I'm saying programs and not CPUs) will probably exhibit slightly different floating-point behaviors. By default, 32-bit programs cannot rely on anything more advanced than x87 instruction set from the CPU (no SSE, SSE2, AVX, etc.) unless you specify this on the compiler command line (or use the intrinsics/inline assembly instructions in your code.) On the other hand, a 64-bit program is guaranteed to run on a CPU with SSE2 support, so the compiler will use those instructions by default (again, unless overridden by the user.) While x87 and SSE2 float datatypes and operations on them are similar, they are - AFAIK - not identical. Which will lead to inconsistencies in the simulation if one program uses one instruction set and another program uses another.
The x87 instruction set includes a "control word" register, which contain flags that control some aspects of floating-point operations (e.g. exact rounding behavior, etc.) This is a runtime thing, and your program can do one set of calculations, then change this register, and after that do the exact same calculations and get a different result. Obviously, this register must be checked and handled and kept identical on the different machines. It is possible for the compiler (or the libraries you use in your program) to generate code that changes these flags at runtime inconsistently across the programs.
Again, in case of the x87 instruction set, Intel and AMD have historically implemented things a little differently. For example, one vendor's CPU might internally do some calculations using more bits (and therefore arrive at a more accurate result) that the other, which means that if you happen to run on two different CPUs (both x86) from two different vendors, the results of simple calculations might not be the same. I don't know how and under what circumstances these higher accuracy calculations are enabled and whether they happen under normal operating conditions or you have to ask for them specifically, but I do know these discrepancies exist.
Random numbers and generating them consistently and deterministically across programs has nothing to do with floating-point consistency. It's important and source of many bugs, but in the end it's just a few more bits of state that you have to keep synched.

And here are a couple of techniques that might help:

Some projects use "fixed-point" numbers and fixed-point arithmetic to avoid rounding errors and general unpredictability of floating-point numbers. Read the Wikipedia article for more information and external links.
In one of my own projects, during development, I used to hash all the relevant state (including a lot of floating-point numbers) in all the instances of the game and send the hash across the network each frame to make sure even one bit of that state wasn't different on different machines. This also helped with debugging, where instead of trusting my eyes to see when and where inconsistencies existed (which wouldn't tell me where they originated, anyways) I would know the instant some part of the state of the game on one machine started diverging from the others, and know exactly what it was (if the hash check failed, I would stop the simulation and start comparing the whole state.)

This feature was implemented in that codebase from the beginning, and was used only during the development process to help with debugging (because it had performance and memory costs.)

Update (in answer to first comment below): As I said in point 1, and others have said in other answers, that doesn't guarantee anything. If you do that, you might decrease the probability and frequency of an inconsistency occurring, but the likelihood doesn't become zero. If you don't analyze what's happening in your code and the possible sources of problems carefully and systematically, it is still possible to run into errors no matter how much you "round off" your numbers.

For example, if you have two numbers (e.g. as results of two calculations that were supposed to produce identical results) that are 1.111499999 and 1.111500001 and you round them to three decimal places, they become 1.111 and 1.112 respectively. The original numbers' difference was only 2E-9, but it has now become 1E-3. In fact, you have increased your error 500'000 times. And still they are not equal even with the rounding. You've exacerbated the problem.

True, this doesn't happen much, and the examples I gave are two unlucky numbers to get in this situation, but it is still possible to find yourself with these kinds of numbers. And when you do, you're in trouble. The only sure-fire solution, even if you use fixed-point arithmetic or whatever, is to do rigorous and systematic mathematical analysis of all your possible problem areas and prove that they will remain consistent across programs.

Short of that, for us mere mortals, you need to have a water-tight way to monitor the situation and find exactly when and how the slightest discrepancies occur, to be able to solve the problem after the fact (instead of relying on your eyes to see problems in game animation or object movement or physical behavior.)

Cross platform/compiler consistent sprintf of floating point numbers

A year later, this is how we solved it.

We downloaded custom print implementation (trio) and forced usage of this implementation instead of the system one in the lua (and our sources).

We also had to change

long double trio_long_double_t;

double trio_long_double_t;

in the triodef.h to ensure the Visual studio and linux/mac gives the same results.

Is Double Math Consistent across Multiple Platforms?

The C# standard, ECMA-334, says in section 11.1.6 Floating point types "Floating-point operations can be performed with higher precision than the result type of the operation.".

Given that sort of rule, you cannot be sure of getting exactly the same answer everywhere. "can" means some implementations may use higher precision than others, for the same calculation.

Even if testing shows that all current implementations get the same answers for all your calculations across a range of inputs, the next compiler release on any platform might change that. If you really need identical results, you need to pick a library or similar that promises identical results.

Calculations vary in how sensitive they are to small changes in inputs. At the extreme are simulations of some physical systems such as weather, where tiny changes in inputs can expand to big changes in results.

One way to get a feeling for the behavior of your simulation is to deliberately perturb some values by e.g. one part in 1e15. Does that change the results enough to matter? However, there will always be a risk that some inputs will lead to less stable conditions than others.

Is the output of a c++ program with floating point predictable?

Is the output guaranteed to be the same in all the above combinations?

No.

Very little about floating point representation is guaranteed by the C++ standard.

can I guarantee identical output by lowering the precision, e.g:
cout << setprecision(4) << x;

No.

If the accurate result of the calculation would be just at the border of rounding direction, then an arbitrarily small difference between the calculation could change the result. Rounding would magnify the difference between the calculations.

If not, what more can I do to have consistent output across machines?

Use something other than hardware floating point. Either software floating point or fixed point arithmetic.
Either use fixed width integers to implement that arithmetic so that one system doesn't have more precision than another, or use arbitrary precision.
If not using arbitrary precision, then only use expressions whose order of evaluation is defined by the language, so that differences between compiler don't result in different amounts of error.

Is floating-point math consistent in C#? Can it be?

I know of no way to way to make normal floating points deterministic in .net. The JITter is allowed to create code that behaves differently on different platforms(or between different versions of .net). So using normal floats in deterministic .net code is not possible.

The workarounds I considered:

Implement FixedPoint32 in C#. While this is not too hard(I have a half finished implementation) the very small range of values makes it annoying to use. You have to be careful at all times so you neither overflow, nor lose too much precision. In the end I found this not easier than using integers directly.
Implement FixedPoint64 in C#. I found this rather hard to do. For some operations intermediate integers of 128bit would be useful. But .net doesn't offer such a type.
Implement a custom 32 bit floatingpoint. The lack of a BitScanReverse intrinsic causes a few annoyances when implementing this. But currently I think this is the most promising path.
Use native code for the math operations. Incurs the overhead of a delegate call on every math operation.

I've just started a software implementation of 32 bit floating point math. It can do about 70million additions/multiplications per second on my 2.66GHz i3.
https://github.com/CodesInChaos/SoftFloat . Obviously it's still very incomplete and buggy.

Cross Platform Floating Point Consistency