Why Does Clang Optimize Away X * 1.0 But Not X + 0.0

Why does Clang optimize away x * 1.0 but NOT x + 0.0?

The IEEE 754-2008 Standard for Floating-Point Arithmetic and the ISO/IEC 10967 Language Independent Arithmetic (LIA) Standard, Part 1 answer why this is so.

IEEE 754 § 6.3 The sign bit

When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation.

When neither the inputs nor result are NaN, the sign of a product or quotient is the exclusive OR of the operands’ signs; the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most
one of the addends’ signs; and the sign of the result of conversions, the quantize operation, the roundTo-Integral operations, and the roundToIntegralExact (see 5.3.1) is the sign of the first or only operand. These rules shall apply even when operands or results are zero or infinite.

When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0. However, x + x = x − (−x) retains the same sign as x even when x is zero.

The Case of Addition

Under the default rounding mode (Round-to-Nearest, Ties-to-Even), we see that x+0.0 produces x, EXCEPT when x is -0.0: In that case we have a sum of two operands with opposite signs whose sum is zero, and §6.3 paragraph 3 rules this addition produces +0.0.

Since +0.0 is not bitwise identical to the original -0.0, and that -0.0 is a legitimate value that may occur as input, the compiler is obliged to put in the code that will transform potential negative zeros to +0.0.

The summary: Under the default rounding mode, in x+0.0, if x

is not -0.0, then x itself is an acceptable output value.
is -0.0, then the output value must be +0.0, which is not bitwise identical to -0.0.

The Case of Multiplication

Under the default rounding mode, no such problem occurs with x*1.0. If x:

is a (sub)normal number, x*1.0 == x always.
is +/- infinity, then the result is +/- infinity of the same sign.
is NaN, then according to

IEEE 754 § 6.2.3 NaN Propagation

An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.
which means that the exponent and mantissa (though not the sign) of NaN*1.0 are recommended to be unchanged from the input NaN. The sign is unspecified in accordance with §6.3p1 above, but an implementation may specify it to be identical to the source NaN.
is +/- 0.0, then the result is a 0 with its sign bit XORed with the sign bit of 1.0, in agreement with §6.3p2. Since the sign bit of 1.0 is 0, the output value is unchanged from the input. Thus, x*1.0 == x even when x is a (negative) zero.

The Case of Subtraction

Under the default rounding mode, the subtraction x-0.0 is also a no-op, because it is equivalent to x + (-0.0). If x is

is NaN, then §6.3p1 and §6.2.3 apply in much the same way as for addition and multiplication.
is +/- infinity, then the result is +/- infinity of the same sign.
is a (sub)normal number, x-0.0 == x always.
is -0.0, then by §6.3p2 we have "[...] the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most one of the addends’ signs;". This forces us to assign -0.0 as the result of (-0.0) + (-0.0), because -0.0 differs in sign from none of the addends, while +0.0 differs in sign from two of the addends, in violation of this clause.
is +0.0, then this reduces to the addition case (+0.0) + (-0.0) considered above in The Case of Addition, which by §6.3p3 is ruled to give +0.0.

Since for all cases the input value is legal as the output, it is permissible to consider x-0.0 a no-op, and x == x-0.0 a tautology.

Value-Changing Optimizations

The IEEE 754-2008 Standard has the following interesting quote:

IEEE 754 § 10.4 Literal meaning and value-changing optimizations

[...]

The following value-changing transformations, among others, preserve the literal meaning of the source code:

Applying the identity property 0 + x when x is not zero and is not a signaling NaN and the result has the same exponent as x.

Applying the identity property 1 × x when x is not a signaling NaN and the result has the same exponent as x.

Changing the payload or sign bit of a quiet NaN.

[...]

Since all NaNs and all infinities share the same exponent, and the correctly rounded result of x+0.0 and x*1.0 for finite x has exactly the same magnitude as x, their exponent is the same.

sNaNs

Signaling NaNs are floating-point trap values; They are special NaN values whose use as a floating-point operand results in an invalid operation exception (SIGFPE). If a loop that triggers an exception were optimized out, the software would no longer behave the same.

However, as user2357112 points out in the comments, the C11 Standard explicitly leaves undefined the behaviour of signaling NaNs (sNaN), so the compiler is allowed to assume they do not occur, and thus that the exceptions that they raise also do not occur. The C++11 standard omits describing a behaviour for signaling NaNs, and thus also leaves it undefined.

Rounding Modes

In alternate rounding modes, the permissible optimizations may change. For instance, under Round-to-Negative-Infinity mode, the optimization x+0.0 -> x becomes permissible, but x-0.0 -> x becomes forbidden.

To prevent GCC from assuming default rounding modes and behaviours, the experimental flag -frounding-math can be passed to GCC.

Conclusion

Clang and GCC, even at -O3, remains IEEE-754 compliant. This means it must keep to the above rules of the IEEE-754 standard. x+0.0 is not bit-identical to x for all x under those rules, but x*1.0 may be chosen to be so: Namely, when we

Obey the recommendation to pass unchanged the payload of x when it is a NaN.
Leave the sign bit of a NaN result unchanged by * 1.0.
Obey the order to XOR the sign bit during a quotient/product, when x is not a NaN.

To enable the IEEE-754-unsafe optimization (x+0.0) -> x, the flag -ffast-math needs to be passed to Clang or GCC.

Why is clang unwilling or unable to eliminate duplicate loads here

The answer seems to be it's an open LLVM issue: [TBAA] Emit distinct TBAA tags for pointers with different depths,types.

Jérôme's answer tipped me off that this might have something to do with Type Based Alias Analysis (TBAA) when I noticed all loads use the same TBAA metadata.

Right now clang only emits^* the following TBAA:

; Descriptors
!15 = !{!"Simple C/C++ TBAA"}
!14 = !{!"omnipotent char", !15, i64 0}
!13 = !{!"any pointer", !14, i64 0}
!21 = !{!"int", !14, i64 0}
!20 = !{!"", !21, i64 0}
; Tags
!12 = !{!13, !13, i64 0}
!19 = !{!20, !21, i64 0}

Looking at the LLVM revision I figured eventually clang might be able to emit something along the lines of:

; Type descriptors
!0 = !{!"TBAA Root"}
!1 = !{!"omnipotent char", !0, i64 0}
!3 = !{!"int", !0, i64 0}
!2 = !{!"any pointer", !1, i64 0}
!11 = !{!"p1 foo", !2, i64 0} ; Foo*
!12 = !{!"p2 foo", !2, i64 0} ; Foo**
!13 = !{!"p3 foo", !2, i64 0} ; Foo***
!14 = !{!"p4 foo", !2, i64 0} ; Foo****
!10 = !{!"foo", !3, i64 0} ; struct {int x}

; Access tags
!20 = !{!14, !14, i64 0} ; Foo****
!21 = !{!13, !13, i64 0} ; Foo***
!22 = !{!12, !12, i64 0} ; Foo**
!23 = !{!11, !11, i64 0} ; Foo*
!24 = !{!10, !3, i64 0}  ; Foo.x

(I'm still not sure I fully grok the TBAA metadata format so please excuse any mistakes)

Together with the code below LLVM produces the expected assembly.

define void @original(ptr %0, ptr %1) {
  %3 = load ptr, ptr %0, !tbaa !20
  %4 = getelementptr ptr, ptr %3, i64 1
  %5 = load ptr, ptr %4, !tbaa !21
  %6 = getelementptr ptr, ptr %5, i64 2
  %7 = load ptr, ptr %6, !tbaa !22
  %8 = getelementptr ptr, ptr %7, i64 3
  store ptr %1, ptr %8, !tbaa !23

  %9 = load ptr, ptr %0, !tbaa !20
  %10 = getelementptr ptr, ptr %9, i64 1
  %11 = load ptr, ptr %10, !tbaa !21
  %12 = getelementptr ptr, ptr %11, i64 2
  %13 = load ptr, ptr %12, !tbaa !22
  %14 = getelementptr ptr, ptr %13, i64 3
  %15 = load ptr, ptr %14, !tbaa !23 ; : Foo*
  store i32 42, ptr %15, !tbaa !24

  ret void
}

Compiler Explorer Playground

* Compiler's Explorer LLVM IR view filters these out by default but you can see them by using -emit-llvm and disabling "Directives" filtering

Why does MSVS not optimize away +0?

The compiler cannot eliminate the addition of a floating-point positive zero because it is not an identity operation. By IEEE 754 rules, the result of adding +0. to −0. is not −0.; it is +0.

The compiler may eliminate the subtraction of +0. or the addition of −0. because those are identity operations.

For example, when I compile this:

double foo(double x) { return x + 0.; }

with Apple GNU C 4.2.1 using -O3 on an Intel Mac, the resulting assembly code contains addsd LC0(%rip), %xmm0. When I compile this:

double foo(double x) { return x - 0.; }

there is no add instruction; the assembly merely returns its input.

So, it is likely the code in the original question contained an add instruction for this statement:

y[i] = y[i] + 0;

but contained no instruction for this statement:

y[i] = y[i] - 0;

However, the first statement involved arithmetic with subnormal values in y[i], so it was sufficient to slow down the program.

Why Does Clang Optimize Away X * 1.0 But Not X + 0.0