How to Force Gcc to Assume That a Floating-Point Expression Is Non-Negative

How to force GCC to assume that a floating-point expression is non-negative?

You can write assert(x*x >= 0.f) as a compile-time promise instead of a runtime check as follows in GNU C:

#include <cmath>

float test1 (float x)
{
    float tmp = x*x;
    if (!(tmp >= 0.0f)) 
        __builtin_unreachable();    
    return std::sqrt(tmp);
}

(related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable() in a macro and call it promise() or something.)

But gcc doesn't know how to take advantage of that promise that tmp is non-NaN and non-negative. We still get (Godbolt) the same canned asm sequence that checks for x>=0 and otherwise calls sqrtf to set errno. Presumably that expansion into a compare-and-branch happens after other optimization passes, so it doesn't help for the compiler to know more.

This is a missed-optimization in the logic that speculatively inlines sqrt when -fmath-errno is enabled (on by default unfortunately).

What you want instead is `-fno-math-errno`, which is safe globally

This is 100% safe if you don't rely on math functions ever setting errno. Nobody wants that, that's what NaN propagation and/or sticky flags that record masked FP exceptions are for. e.g. C99/C++11 fenv access via #pragma STDC FENV_ACCESS ON and then functions like fetestexcept(). See the example in feclearexcept which shows using it to detect division by zero.

The FP environment is part of thread context while errno is global.

Support for this obsolete misfeature is not free; you should just turn it off unless you have old code that was written to use it. Don't use it in new code: use fenv. Ideally support for -fmath-errno would be as cheap as possible but the rarity of anyone actually using __builtin_unreachable() or other things to rule out a NaN input presumably made it not worth developer's time to implement the optimization. Still, you could report a missed-optimization bug if you wanted.

Real-world FPU hardware does in fact have these sticky flags that stay set until cleared, e.g. x86's mxcsr status/control register for SSE/AVX math, or hardware FPUs in other ISAs. On hardware where the FPU can detect exceptions, a quality C++ implementation will support stuff like fetestexcept(). And if not, then math-errno probably doesn't work either.

errno for math was an old obsolete design that C / C++ is still stuck with by default, and is now widely considered a bad idea. It makes it harder for compilers to inline math functions efficiently. Or maybe we're not as stuck with it as I thought: Why errno is not set to EDOM even sqrt takes out of domain arguement? explains that setting errno in math functions is optional in ISO C11, and an implementation can indicate whether they do it or not. Presumably in C++ as well.

It's a big mistake to lump -fno-math-errno in with value-changing optimizations like -ffast-math or -ffinite-math-only. You should strongly consider enabling it globally, or at least for the whole file containing this function.

float test2 (float x)
{
    return std::sqrt(x*x);
}

# g++ -fno-math-errno -std=gnu++17 -O3
test2(float):   # and test1 is the same
        mulss   xmm0, xmm0
        sqrtss  xmm0, xmm0
        ret

You might as well use -fno-trapping-math as well, if you aren't ever going to unmask any FP exceptions with feenableexcept(). (Although that option isn't required for this optimization, it's only the errno-setting crap that's a problem here.).

-fno-trapping-math doesn't assume no-NaN or anything, it only assumes that FP exceptions like Invalid or Inexact won't ever actually invoke a signal handler instead of producing NaN or a rounded result. -ftrapping-math is the default but it's broken and "never worked" according to GCC dev Marc Glisse. (Even with it on, GCC does some optimizations which can change the number of exceptions that would be raised from zero to non-zero or vice versa. And it blocks some safe optimizations). But unfortunately, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192 (make it off by default) is still open.

If you actually ever did unmask exceptions, it might be better to have -ftrapping-math, but again it's very rare that you'd ever want that instead of just checking flags after some math operations, or checking for NaN. And it doesn't actually preserve exact exception semantics anyway.

See SIMD for float threshold operation for a case where the -ftrapping-math default incorrectly blocks a safe optimization. (Even after hoisting a potentially-trapping operation so the C does it unconditionally, gcc makes non-vectorized asm that does it conditionally! So not only does GCC block vectorization, it changes the exception semantics vs. the C abstract machine.) -fno-trapping-math enables the expected optimization.

Make C floating point literals float (rather than double)

-fsingle-precision-constant flag can be used. It causes floating-point constants to be loaded in single precision even when this is not exact.

Note- This will also use single precision constants in operations on double precision variables.

Argument order to std::min changes compiler output for floating-point

minsd a,b is not commutative for some special FP values, and neither is std::min, unless you use -ffast-math.

minsd a,b exactly implements (a<b) ? a : b including everything that implies about signed-zero and NaN in strict IEEE-754 semantics. (i.e. it keeps the source operand, b, on unordered¹ or equal). As Artyer points out, -0.0 and +0.0 compare equal (i.e. -0. < 0. is false), but they are distinct.

std::min is defined in terms of an (a<b) comparison expression (cppreference), with (a<b) ? a : b as a possible implementation, unlike std::fmin which guarantees NaN propagation from either operand, among other things. (fmin originally came from the C math library, not a C++ template.)

See What is the instruction that gives branchless FP min and max on x86? for much more detail about minss/minsd / maxss/maxsd (and the corresponding intrinsics, which follow the same non-commutative rules except in some GCC versions.)

Footnote 1: Remember that NaN<b is false for any b, and for any comparison predicate. e.g. NaN == b is false, and so is NaN > b. Even NaN == NaN is false. When one or more of a pair are NaN, they are "unordered" wrt. each other.

With -ffast-math (to tell the compiler to assume no NaNs, and other assumptions and approximations), compilers will optimize either function to a single minsd. https://godbolt.org/z/a7oK91

For GCC, see https://gcc.gnu.org/wiki/FloatingPointMath

clang supports similar options, including -ffast-math as a catch-all.

Some of those options should be enabled by almost everyone, except for weird legacy codebases, e.g. -fno-math-errno. (See this Q&A for more about recommended math optimizations). And gcc -fno-trapping-math is a good idea because it doesn't fully work anyway, despite being on by default (some optimizations can still change the number of FP exceptions that would be raised if exceptions were unmasked, including sometimes even from 1 to 0 or 0 to non-zero, IIRC). gcc -ftrapping-math also blocks some optimizations that are 100% safe even wrt. exception semantics, so it's pretty bad. In code that doesn't use fenv.h, you'll never know the difference.

But treating std::min as commutative can only be accomplished with options that assume no NaNs, and stuff like that, so definitely can't be called "safe" for code that cares about exactly what happens with NaNs. e.g. -ffinite-math-only assumes no NaNs (and no infinities)

clang -funsafe-math-optimizations -ffinite-math-only will do the optimization you're looking for. (unsafe-math-optimizations implies a bunch of more specific options, including not caring about signed zero semantics).

Unordered floating point comparisons in C

However, apart from that I did not find other unordered comparison operations. Are there any GNU extensions (or other standard library functions) that offer unordered floating point comparisons?

As @user2357112 observed in comments, "unordered floating-point comparisons" aren't a thing. The term doesn't even make sense. What you seem to want to evaluate are predicates of the form "x is less than y or the two are unordered".

Conforming C implementations are not at liberty to add operators, even as extensions. They can, in principle, define additional meanings for existing operators, but I don't know any implementation that provides the specific operations you are looking for that way. As you've already discovered, it's pretty trivial to use C's existing operators for the purpose:

So far, I only could implement them by checking for the opposite condition. For example, to implement an unordered a >= b comparison, I instead wrote an ordered !(a < b)

Update: The problem with that is that these comparisons will raise a floating-point exception when one of the operands is NaN (and FP exceptions are not disabled). But you're in luck! Since C99, there are standard macros implementing the comparisons you seek. These are guaranteed to evaluate their arguments only once, and they do not raise floating-point exceptions.

~~And, of course, if you want to be able to express more clearly in your code that you are explicitly accommodating NaNs, then you could always write macros for it:~~

#define GE_OR_UNORDERED(x, y) (!((x) < (y)))

// ...

if (GE_OR_UNORDERED(a, b)) // ...

Note also that all of this relies heavily on the implementation details. Although C recognizes the possibility that real types can accommodate values, such as NaNs, that do not represent floating-point numbers, it does not require them to do so, nor does it define the behavior of relational or arithmetic operations on such values. Though most implementations these days use IEEE-754 floating point formats and operations, they are not required to do so, and historically, some have not.

Which functions are affected by -fno-math-errno?

You can find list of functions affected by -fno-math-errno in GCC's builtins.def (search for "ERRNO"). It seems that only some functions from math.h header (cos, sin, exp, etc.) are affected. Treatment of other standard functions that use errno (strtol, etc.) will not change under this flag.

gcc '-m32' option changes floating-point rounding when not running valgrind

Edit: It would seem that, at least a long time back, valgrind's floating point calculations wheren't quite as accurate as the "real" calculations. In other words, this MAY explain why you get different results. See this question and answer on the valgrind mailing list.

Edit2: And the current valgrind.org documentation has it in it's "core limitations" section here - so I would expect that it is indeed "still valid". In other words the documentation for valgrind says to expect differences between valgrind and x87 FPU calculations. "You have been warned!" (And as we can see, using sse instructions to do the same math produces the same result as valgrind, confirming that it's a "rounding from 80 bits to 64 bits" difference)

Floating point calculations WILL differ slightly depending on exactly how the calculation is performed. I'm not sure exactly what you want to have an answer to, so here's a long rambling "answer of a sort".

Valgrind DOES indeed change the exact behaviour of your program in various ways (it emulates certain instructions, rather than actually executing the real instructions - which may include saving the intermediate results of calculations). Also, floating point calculations are well known to "not be precise" - it's just a matter of luck/bad luck if the calculation comes out precise or not. 0.98 is one of many, many numbers that can't be described precisely in floating point format [at least not the common IEEE formats].

By adding:

cerr<<"c="<<std::setprecision(30)<<c <<"\n";

we see that the output is c=0.979999999999999982236431605997 (yes, the actual value is 0.979999...99982 or some such, remaining digits is just the residual value, since it's not an "even" binary number, there's always going to be something left.

This is the n = 2550;, c = 0.98 and q = n * c part of the code as generated by gcc:

movl    $2550, -28(%ebp)       ; n
fldl    .LC0
fstpl   -40(%ebp)              ; c
fildl   -28(%ebp)
fmull   -40(%ebp)
fstpl   -48(%ebp)              ; q - note that this is stored as a rouned 64-bit value.

This is the int(q) and int(n*c) part of the code:

fildl   -28(%ebp)             ; n
fmull   -40(%ebp)             ; c 
fnstcw  -58(%ebp)             ; Save control word
movzwl  -58(%ebp), %eax
movb    $12, %ah
movw    %ax, -60(%ebp)        ; Save float control word.
fldcw   -60(%ebp)
fistpl  -64(%ebp)             ; Store as integer (directly from 80-bit result)
fldcw   -58(%ebp)             ; restore float control word.
movl    -64(%ebp), %ebx       ; result of int(n * c)

fldl    -48(%ebp)             ; q
fldcw   -60(%ebp)             ; Load float control word as saved above.
fistpl  -64(%ebp)             ; Store as integer.
fldcw   -58(%ebp)             ; Restore control word.
movl    -64(%ebp), %esi       ; result of int(q)

Now, if the intermediate result is stored (and thus rounded) from the internal 80-bit precision in the middle of one of those calculations, the result may be subtly different from the result if the calculation happens without saving intermediate values.

I get identical results from both g++ 4.9.2 and clang++ -mno-sse - but if I enable sse in the clang case, it gives the same result as 64-bit build. Using gcc -msse2 -m32 gives the 2499 answer everywhere. This indicates that the error is caused by "storing intermediate results" in some way or another.

Likewise, optimising in gcc to -O1 will give the 2499 in all places - but this is a coincidence, not a result of some "clever thinking". If you want correctly rounded integer values of your calculations, you're much better off rounding yourself, because sooner or later int(someDoubleValue) will come up "one short".

Edit3: And finally, using g++ -mno-sse -m64 will also produce the same 2498 answer, where using valgrind on the same binary produces the 2499 answer.

Can I force g++ to pass floating number arguments on the stack?

From the comments, I think you're basically trying to do what can be achieved using the type-safe functionality offered by std::function and std::bind (from C++11).

#include <functional>
#include <iostream>

using namespace std;

void foo(int x, double y, float z)
{
    cout << x << ", " << y << ", " << z << endl;
}

void call_agent(void* pointer)
{
    function<void()>* wrapper = reinterpret_cast<function<void()>*>(pointer);
    (*wrapper)();
    delete wrapper;
}

int main()
{
    function<void()>* callback = new function<void()>(bind(&foo, 1, 7.5, 8.4f));
    call_agent(callback);
}

Simply pass pointers to std::function<void()> objects to the awake function (and delete them after you've called them). void() as a template parameter means that the wrapped function should have no return value and accept no parameter, you can tweak that however you like (int() would be "int return value and no parameter", char(short, float) would be "char return value and accepts a short and a float", etc.)

std::bind(Function, args...) returns a callable object of an unspecified type. The parameters you passed to bind are then forwarded to the function. std::function wraps any kind of callable object, including function pointers and the result of an std::bind call, so you don't need to worry about the type of the thing you're calling.

What's more, you'll get compile-time errors if you provide incorrect parameters to bind instead of having to wade through disassembly to figure out what went wrong. You can also pass composite types to bind (structures, unions), and even non-POD types.

Floating point limits (double) defined with long double suffix L

Specifying binary floating point numbers in decimal has subtle issues.

Why are the values defined as long double with suffix L and then casted back to a double?

With typical binary64, the maximum finite value is about 1.795e+308 or exactly.

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

The numbers of digits needed to convert to a unique double may be as many as DBL_DECIMAL_DIG (typically 17 and at least 10). In any case, using exponential notation is certainly clear without being overly precise.

/*
1 2345678901234567 */          // Sorted 
1.79769313486231550856124...   // DBL_MAX next smallest for reference
1.79769313486231570814527...   // Exact
1.79769313486231570815e+308L   // gcc
1.7976931348623158e+308        // VS (just a hair closer to exact than "next largerst")
1.7976931348623159077293....   // DBL_MAX next largest if not limited by range

Various compilers may not convert this string exactly as hoped. Sometimes ignoring some least significant digits - although this is controlled by the compiler.

Another source of subtle conversion differences, and I expect this is why the 'L' is added, the double computation is affected by the processor's floating point unit which might not have exact adherence to IEEE standards. The worse result could be that the 1.797...e+308 constant converts to infinity due to minute conversion errors the "code to a double" using double math. By converting to a long double, those long double conversion errors are very small. Then converting the long double result to double rounds to the hoped for number.

In short, forcing L math insures the constant is not inadvertently made an infinity.

I would expect the following which matches neither gcc nor VS to be sufficient with a compliant IEEE 754 standard FPU.

#define __DBL_MAX__ 1.7976931348623157e+308

The cast back to double is to make DBL_MAX a double. This would meet many code's expectations that a DBL_MAX is a double and not a long double. I see no specification that requires this though.

Why is the DBL_MIN_10_EXP defined with -307 but the minimum exponent is -308?

That is to comply with the definition of DBL_MIN_10_EXP. "... minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers" The non-integer answer is between -307 and -308, so the minimum integer in range is -307.

observation part

Although VS treats long double as a distinct type, the same encoding as double is used, thus there is no numeric advantage in using L.

How to Force Gcc to Assume That a Floating-Point Expression Is Non-Negative