How to force GCC to assume that a floating-point expression is non-negative?
You can write assert(x*x >= 0.f)
as a compile-time promise instead of a runtime check as follows in GNU C:
#include <cmath>
float test1 (float x)
{
float tmp = x*x;
if (!(tmp >= 0.0f))
__builtin_unreachable();
return std::sqrt(tmp);
}
(related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable()
in a macro and call it promise()
or something.)
But gcc doesn't know how to take advantage of that promise that tmp
is non-NaN and non-negative. We still get (Godbolt) the same canned asm sequence that checks for x>=0
and otherwise calls sqrtf
to set errno
. Presumably that expansion into a compare-and-branch happens after other optimization passes, so it doesn't help for the compiler to know more.
This is a missed-optimization in the logic that speculatively inlines sqrt
when -fmath-errno
is enabled (on by default unfortunately).
What you want instead is -fno-math-errno
, which is safe globally
This is 100% safe if you don't rely on math functions ever setting errno
. Nobody wants that, that's what NaN propagation and/or sticky flags that record masked FP exceptions are for. e.g. C99/C++11 fenv
access via #pragma STDC FENV_ACCESS ON
and then functions like fetestexcept()
. See the example in feclearexcept
which shows using it to detect division by zero.
The FP environment is part of thread context while errno
is global.
Support for this obsolete misfeature is not free; you should just turn it off unless you have old code that was written to use it. Don't use it in new code: use fenv
. Ideally support for -fmath-errno
would be as cheap as possible but the rarity of anyone actually using __builtin_unreachable()
or other things to rule out a NaN input presumably made it not worth developer's time to implement the optimization. Still, you could report a missed-optimization bug if you wanted.
Real-world FPU hardware does in fact have these sticky flags that stay set until cleared, e.g. x86's mxcsr
status/control register for SSE/AVX math, or hardware FPUs in other ISAs. On hardware where the FPU can detect exceptions, a quality C++ implementation will support stuff like fetestexcept()
. And if not, then math-errno
probably doesn't work either.
errno
for math was an old obsolete design that C / C++ is still stuck with by default, and is now widely considered a bad idea. It makes it harder for compilers to inline math functions efficiently. Or maybe we're not as stuck with it as I thought: Why errno is not set to EDOM even sqrt takes out of domain arguement? explains that setting errno in math functions is optional in ISO C11, and an implementation can indicate whether they do it or not. Presumably in C++ as well.
It's a big mistake to lump -fno-math-errno
in with value-changing optimizations like -ffast-math
or -ffinite-math-only
. You should strongly consider enabling it globally, or at least for the whole file containing this function.
float test2 (float x)
{
return std::sqrt(x*x);
}
# g++ -fno-math-errno -std=gnu++17 -O3
test2(float): # and test1 is the same
mulss xmm0, xmm0
sqrtss xmm0, xmm0
ret
You might as well use -fno-trapping-math
as well, if you aren't ever going to unmask any FP exceptions with feenableexcept()
. (Although that option isn't required for this optimization, it's only the errno
-setting crap that's a problem here.).
-fno-trapping-math
doesn't assume no-NaN or anything, it only assumes that FP exceptions like Invalid or Inexact won't ever actually invoke a signal handler instead of producing NaN or a rounded result. -ftrapping-math
is the default but it's broken and "never worked" according to GCC dev Marc Glisse. (Even with it on, GCC does some optimizations which can change the number of exceptions that would be raised from zero to non-zero or vice versa. And it blocks some safe optimizations). But unfortunately, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192 (make it off by default) is still open.
If you actually ever did unmask exceptions, it might be better to have -ftrapping-math
, but again it's very rare that you'd ever want that instead of just checking flags after some math operations, or checking for NaN. And it doesn't actually preserve exact exception semantics anyway.
See SIMD for float threshold operation for a case where the -ftrapping-math
default incorrectly blocks a safe optimization. (Even after hoisting a potentially-trapping operation so the C does it unconditionally, gcc makes non-vectorized asm that does it conditionally! So not only does GCC block vectorization, it changes the exception semantics vs. the C abstract machine.) -fno-trapping-math
enables the expected optimization.
Make C floating point literals float (rather than double)
-fsingle-precision-constant
flag can be used. It causes floating-point constants to be loaded in single precision even when this is not exact.
Note- This will also use single precision constants in operations on double precision variables.
Argument order to std::min changes compiler output for floating-point
minsd a,b
is not commutative for some special FP values, and neither is std::min
, unless you use -ffast-math
.
minsd a,b
exactly implements (a<b) ? a : b
including everything that implies about signed-zero and NaN in strict IEEE-754 semantics. (i.e. it keeps the source operand, b
, on unordered1 or equal). As Artyer points out, -0.0
and +0.0
compare equal (i.e. -0. < 0.
is false), but they are distinct.
std::min
is defined in terms of an (a<b)
comparison expression (cppreference), with (a<b) ? a : b
as a possible implementation, unlike std::fmin
which guarantees NaN propagation from either operand, among other things. (fmin
originally came from the C math library, not a C++ template.)
See What is the instruction that gives branchless FP min and max on x86? for much more detail about minss/minsd / maxss/maxsd (and the corresponding intrinsics, which follow the same non-commutative rules except in some GCC versions.)
Footnote 1: Remember that NaN<b
is false for any b
, and for any comparison predicate. e.g. NaN == b
is false, and so is NaN > b
. Even NaN == NaN
is false. When one or more of a pair are NaN, they are "unordered" wrt. each other.
With -ffast-math
(to tell the compiler to assume no NaNs, and other assumptions and approximations), compilers will optimize either function to a single minsd
. https://godbolt.org/z/a7oK91
For GCC, see https://gcc.gnu.org/wiki/FloatingPointMath
clang supports similar options, including -ffast-math
as a catch-all.
Some of those options should be enabled by almost everyone, except for weird legacy codebases, e.g. -fno-math-errno
. (See this Q&A for more about recommended math optimizations). And gcc -fno-trapping-math
is a good idea because it doesn't fully work anyway, despite being on by default (some optimizations can still change the number of FP exceptions that would be raised if exceptions were unmasked, including sometimes even from 1 to 0 or 0 to non-zero, IIRC). gcc -ftrapping-math
also blocks some optimizations that are 100% safe even wrt. exception semantics, so it's pretty bad. In code that doesn't use fenv.h
, you'll never know the difference.
But treating std::min
as commutative can only be accomplished with options that assume no NaNs, and stuff like that, so definitely can't be called "safe" for code that cares about exactly what happens with NaNs. e.g. -ffinite-math-only
assumes no NaNs (and no infinities)
clang -funsafe-math-optimizations -ffinite-math-only
will do the optimization you're looking for. (unsafe-math-optimizations implies a bunch of more specific options, including not caring about signed zero semantics).
Unordered floating point comparisons in C
However, apart from that I did not find other unordered comparison operations. Are there any GNU extensions (or other standard library functions) that offer unordered floating point comparisons?
As @user2357112 observed in comments, "unordered floating-point comparisons" aren't a thing. The term doesn't even make sense. What you seem to want to evaluate are predicates of the form "x is less than y or the two are unordered".
Conforming C implementations are not at liberty to add operators, even as extensions. They can, in principle, define additional meanings for existing operators, but I don't know any implementation that provides the specific operations you are looking for that way. As you've already discovered, it's pretty trivial to use C's existing operators for the purpose:
So far, I only could implement them by checking for the opposite condition. For example, to implement an unordered
a >= b
comparison, I instead wrote an ordered!(a < b)
Update: The problem with that is that these comparisons will raise a floating-point exception when one of the operands is NaN (and FP exceptions are not disabled). But you're in luck! Since C99, there are standard macros implementing the comparisons you seek. These are guaranteed to evaluate their arguments only once, and they do not raise floating-point exceptions.
And, of course, if you want to be able to express more clearly in your code that you are explicitly accommodating NaNs, then you could always write macros for it:
#define GE_OR_UNORDERED(x, y) (!((x) < (y)))
// ...
if (GE_OR_UNORDERED(a, b)) // ...
Note also that all of this relies heavily on the implementation details. Although C recognizes the possibility that real types can accommodate values, such as NaNs, that do not represent floating-point numbers, it does not require them to do so, nor does it define the behavior of relational or arithmetic operations on such values. Though most implementations these days use IEEE-754 floating point formats and operations, they are not required to do so, and historically, some have not.
Which functions are affected by -fno-math-errno?
You can find list of functions affected by -fno-math-errno
in GCC's builtins.def (search for "ERRNO"). It seems that only some functions from math.h
header (cos
, sin
, exp
, etc.) are affected. Treatment of other standard functions that use errno (strtol
, etc.) will not change under this flag.
gcc '-m32' option changes floating-point rounding when not running valgrind
Edit: It would seem that, at least a long time back, valgrind's floating point calculations wheren't quite as accurate as the "real" calculations. In other words, this MAY explain why you get different results. See this question and answer on the valgrind mailing list.
Edit2: And the current valgrind.org documentation has it in it's "core limitations" section here - so I would expect that it is indeed "still valid". In other words the documentation for valgrind says to expect differences between valgrind and x87 FPU calculations. "You have been warned!" (And as we can see, using sse instructions to do the same math produces the same result as valgrind, confirming that it's a "rounding from 80 bits to 64 bits" difference)
Floating point calculations WILL differ slightly depending on exactly how the calculation is performed. I'm not sure exactly what you want to have an answer to, so here's a long rambling "answer of a sort".
Valgrind DOES indeed change the exact behaviour of your program in various ways (it emulates certain instructions, rather than actually executing the real instructions - which may include saving the intermediate results of calculations). Also, floating point calculations are well known to "not be precise" - it's just a matter of luck/bad luck if the calculation comes out precise or not. 0.98 is one of many, many numbers that can't be described precisely in floating point format [at least not the common IEEE formats].
By adding:
cerr<<"c="<<std::setprecision(30)<<c <<"\n";
we see that the output is c=0.979999999999999982236431605997
(yes, the actual value is 0.979999...99982 or some such, remaining digits is just the residual value, since it's not an "even" binary number, there's always going to be something left.
This is the n = 2550;
, c = 0.98
and q = n * c
part of the code as generated by gcc:
movl $2550, -28(%ebp) ; n
fldl .LC0
fstpl -40(%ebp) ; c
fildl -28(%ebp)
fmull -40(%ebp)
fstpl -48(%ebp) ; q - note that this is stored as a rouned 64-bit value.
This is the int(q)
and int(n*c)
part of the code:
fildl -28(%ebp) ; n
fmull -40(%ebp) ; c
fnstcw -58(%ebp) ; Save control word
movzwl -58(%ebp), %eax
movb $12, %ah
movw %ax, -60(%ebp) ; Save float control word.
fldcw -60(%ebp)
fistpl -64(%ebp) ; Store as integer (directly from 80-bit result)
fldcw -58(%ebp) ; restore float control word.
movl -64(%ebp), %ebx ; result of int(n * c)
fldl -48(%ebp) ; q
fldcw -60(%ebp) ; Load float control word as saved above.
fistpl -64(%ebp) ; Store as integer.
fldcw -58(%ebp) ; Restore control word.
movl -64(%ebp), %esi ; result of int(q)
Now, if the intermediate result is stored (and thus rounded) from the internal 80-bit precision in the middle of one of those calculations, the result may be subtly different from the result if the calculation happens without saving intermediate values.
I get identical results from both g++ 4.9.2 and clang++ -mno-sse - but if I enable sse in the clang case, it gives the same result as 64-bit build. Using gcc -msse2 -m32
gives the 2499 answer everywhere. This indicates that the error is caused by "storing intermediate results" in some way or another.
Likewise, optimising in gcc to -O1 will give the 2499 in all places - but this is a coincidence, not a result of some "clever thinking". If you want correctly rounded integer values of your calculations, you're much better off rounding yourself, because sooner or later int(someDoubleValue)
will come up "one short".
Edit3: And finally, using g++ -mno-sse -m64
will also produce the same 2498
answer, where using valgrind
on the same binary produces the 2499
answer.
Can I force g++ to pass floating number arguments on the stack?
From the comments, I think you're basically trying to do what can be achieved using the type-safe functionality offered by std::function
and std::bind
(from C++11).
#include <functional>
#include <iostream>
using namespace std;
void foo(int x, double y, float z)
{
cout << x << ", " << y << ", " << z << endl;
}
void call_agent(void* pointer)
{
function<void()>* wrapper = reinterpret_cast<function<void()>*>(pointer);
(*wrapper)();
delete wrapper;
}
int main()
{
function<void()>* callback = new function<void()>(bind(&foo, 1, 7.5, 8.4f));
call_agent(callback);
}
Simply pass pointers to std::function<void()>
objects to the awake
function (and delete
them after you've called them). void()
as a template parameter means that the wrapped function should have no return value and accept no parameter, you can tweak that however you like (int()
would be "int return value and no parameter", char(short, float)
would be "char return value and accepts a short and a float", etc.)
std::bind(Function, args...)
returns a callable object of an unspecified type. The parameters you passed to bind
are then forwarded to the function. std::function
wraps any kind of callable object, including function pointers and the result of an std::bind
call, so you don't need to worry about the type of the thing you're calling.
What's more, you'll get compile-time errors if you provide incorrect parameters to bind
instead of having to wade through disassembly to figure out what went wrong. You can also pass composite types to bind
(structures, unions), and even non-POD types.
Floating point limits (double) defined with long double suffix L
Specifying binary floating point numbers in decimal has subtle issues.
Why are the values defined as long double with suffix L and then casted back to a double?
With typical binary64, the maximum finite value is about 1.795e+308
or exactly.
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
The numbers of digits needed to convert to a unique double
may be as many as DBL_DECIMAL_DIG
(typically 17 and at least 10). In any case, using exponential notation is certainly clear without being overly precise.
/*
1 2345678901234567 */ // Sorted
1.79769313486231550856124... // DBL_MAX next smallest for reference
1.79769313486231570814527... // Exact
1.79769313486231570815e+308L // gcc
1.7976931348623158e+308 // VS (just a hair closer to exact than "next largerst")
1.7976931348623159077293.... // DBL_MAX next largest if not limited by range
Various compilers may not convert this string exactly as hoped. Sometimes ignoring some least significant digits - although this is controlled by the compiler.
Another source of subtle conversion differences, and I expect this is why the 'L' is added, the double
computation is affected by the processor's floating point unit which might not have exact adherence to IEEE standards. The worse result could be that the 1.797...e+308
constant converts to infinity due to minute conversion errors the "code to a double
" using double math. By converting to a long double
, those long double
conversion errors are very small. Then converting the long double
result to double
rounds to the hoped for number.
In short, forcing L
math insures the constant is not inadvertently made an infinity.
I would expect the following which matches neither gcc nor VS to be sufficient with a compliant IEEE 754 standard FPU.
#define __DBL_MAX__ 1.7976931348623157e+308
The cast back to double
is to make DBL_MAX
a double
. This would meet many code's expectations that a DBL_MAX
is a double
and not a long double
. I see no specification that requires this though.
Why is the DBL_MIN_10_EXP defined with -307 but the minimum exponent is -308?
That is to comply with the definition of DBL_MIN_10_EXP
. "... minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers" The non-integer answer is between -307 and -308, so the minimum integer in range is -307.
observation part
Although VS treats long double
as a distinct type, the same encoding as double
is used, thus there is no numeric advantage in using L
.
Related Topics
How to Avoid Undefined Execution Order for the Constructors When Using Std::Make_Tuple
Pair<Int,Int> Pair as Key of Unordered_Map Issue
When to Use Vectors and When to Use Arrays in C++
How to Test an Exe with Google Test
Static Member Initialization for Specialized Template Class
C++ Ifstream Error Using String as Opening File Path
Clean Eclipse Index, It Is Out of Sync with Code
What Should I Know About Structured Exceptions (Seh) in C++
Some Clarification Needed About Synchronous Versus Asynchronous Asio Operations
Stepping into Qt Sources in Qt Creator (In Ubuntu Linux)
How to Extend a Lexical Cast to Support Enumerated Types
Literal Initialization for Const References
What Is Better: Reserve Vector Capacity, Preallocate to Size or Push Back in Loop
How to Clone Object in C++? or Is There Another Solution