Uses for Negative Zero Floating Point Value

Does float have a negative zero? (-0f)

According to the standard, negative zero exists but it is equal to positive zero. For almost all purposes, the two behave the same way and many consider the existence of a negative to be an implementation detail. There are, however, some functions that behave quite differently, namely division and atan2:

#include <math.h>
#include <stdio.h>

int main() {
    double x = 0.0;
    double y = -0.0;
    printf("%.08f == %.08f: %d\n", x, y, x == y);
    printf("%.08f == %.08f: %d\n", 1 / x, 1 / y, 1 / x == 1 / y);
    printf("%.08f == %.08f: %d\n", atan2(x, y), atan2(y, y), atan2(x, y) == atan2(y, y));
}

The result from this code is:

0.00000000 == -0.00000000: 1
1.#INF0000 == -1.#INF0000: 0
3.14159265 == -3.14159265: 0

This would mean that code would correctly handle certain limits without a need for explicit handling. It's not certain that relying on this feature for values close to the limits is a good idea, since a simple calculation error can change the sign and make the value far from correct, but you can still take advantage of it if you avoid calculations that would change the sign.

In IEEE 754, why does adding negative zero result in a no-op but adding positive zero does not?

In the default round-to-nearest mode, that most high-level languages support exclusively, because they do not provide options to disable floating-point optimizations that become inapplicable in other modes—I assume that Rust falls in this category—, adding -0.0 happens to have no effect on any floating-point value (omitting small details about NaNs), whereas adding +0.0 has an effect on -0.0 (the result of -0.0 + (+0.0) is +0.0).

What is negative zero and what is its use?

Each possible floating point valuue actually represents a small range of possible real world numbers (because there are only a finite number of possible floating point numbers but an infinite number of actual values). So 0.0 represents a value anywhere between 0.0 and a very small positive number, whereas -0.0 represents a value anywhere between 0.0 and a very small negative value.

Note however they when we compare 0.0 and -0.0 they are considered to be equal, even though the actual representation in bits is different.

Negative integer zero

Historically, there were integer formats that could represent both −0 and +0. Both sign-and-magnitude and one’s complement can represent −0 and +0. These proved to be less useful than two’s complement, which won favor and is ubiquitous today.

Two’s complement has some numerical properties that make it a little nicer to implement in hardware, and having two zeros caused some nuisance for programmers. (I heard of bugs such as an account balance being −0 instead of +0 resulting in a person being sent a bill when they should not have been.)

Floating-point uses sign-and-magnitude, so it can represent both −0 and +0. Due to the nature of floating-point, the arithmetic properties of two’s complement would not aid a floating-point implementation as much, and having two zeros allows a programmer to use a little extra information in some circumstances.

So the choices for integer and floating-point formats are motivated by utility, not mathematical necessity.

A Look At Integer Arithmetic

Let’s consider implementing some integer arithmetic in computer hardware using four bits for study. Essentially the first thing we would do is implement unsigned binary arithmetic, so we design some logic gates to make adders and other arithmetic units. So the inputs 0101 and 0011 to the adder produce output 1000.

Next, we want to handle negative numbers. In writing, we handle negative numbers by putting a sign on front, so our first thought might be to do the same thing with bits: Use a bit in front to indicate negative. Now we have a sign-and-magnitude representation. 0001 represents +1, and 1001 represents −1. 0010 represents +2, and 1010 represents −2. 0111 represents +7 , and 1111 represents −7. And, of course, 0000 represents +0, and 1000 represents −0. That is an idea, and then we must implement it. We have already got an adder, and, if we feed it 0010 (2) and 0011 (3), it correctly outputs 0101 (5). But, if we feed it 0011 (3) and 1001 (−1), it outputs 1100 (−4). So we have to modify it. Well, that is not too bad, we have a subtraction unit for unsigned binary, so we can look at the first bit, and, if we are adding a negative number, we subtract instead of adding. That works for some operations; for 0011 and 1001, observing the leading 1 on the second operand and feeding 011 and 001 to the subtraction unit would produce 010 (2), which is correct. But, if we have 0010 and 1011, feeding 010 and 011 to the subtraction unit might produce some error indication (it was originally designed for unsigned binary) or it might “wrap” and produce 111 (because such wrapping, along with a “borrow out” bit in the output, makes the subtraction unit work as part of a design for subtracting wider numbers). Either way, that is wrong for our signed numbers; we want the output of 0010 (2) plus 1011 (−3) to be 1001 (−1). So we have to design new arithmetic units that handle this. Perhaps, when adding numbers of mixed signs, they figure out which one is larger in magnitude, subtract the smaller from the larger, and then apply the sign bit of the larger. In any case, we have a fair amount of work to do just to design the addition and subtraction units.

Another suggestion is, to make a number negative, invert every bit. This is called one’s complement. It is easy to understand and fits the notion of negation—just negate everything. Let’s consider how it affects our arithmetic units. For the combinations of +3 or −3 with +2 or −2, we would want these results: 0011 (3) + 0010 (2) = 0101 (5), 0011 (3) + 1101 (−2) = 0001 (1), 1100 (−3) + 0010 (2) = 1110 (−1), and 1100 (−3) + 1101 (−2) = 1010 (−5). Upon examination, there is a simple way to adapt our binary adder to make this work: Do the addition on all four bits as if they were unsigned binary, and, if there is a carry out of the leading bit, add it back to the low bit. In unsigned binary 0011 + 0010 = 0101 with no carry, so the final output is 0101. 0011 + 1101 = 0000 with a carry, so the final result is 0001. 1100 + 0010 = 1110 with no carry, so the final result is 1110. 1100 + 1101 = 1001 with a carry, so the final result is 1010.

This is nice; our one’s complement adder is simpler than the sign-and-magnitude adder. It does not need to compare magnitudes and does not need to do a subtraction to handle negative numbers. We can make it cheaper and make more profit.

Then somebody comes up with the idea of two’s complement. Instead of inverting every bit, we will conceptually subtract the number from 2ⁿ, where n is the number of bits. So 10000 − 0001 = 1111 represents −1, and 1110 is −2, 1101 is −3, and so on. What does this do to our adder?

In unsigned binary, 0010 (2) + 1101 (13) = 1111 (15). In two’s complement, 0010 (2) + 1101 (−3) = 1111 (−1). The bits are the same! This actually works for all two’s complement numbers; adding the bit patterns for unsigned numbers produces the same results we want for adding two’s complement numbers. We can use the exact same logic gates for unsigned binary and two’s complement. That is brilliant, give that employee a raise. That is what modern hardware does; the same arithmetic units are used for adding or subtracting two’s complement numbers as are used for adding or subtracting unsigned numbers.

This is a large part of why two’s complement won out for representing negative integers. It results in simpler, easier, cheaper, faster, and more efficient computers.

(There is a difference between unsigned addition and two’s complement addition: How overflow is detected. In unsigned addition, an overflow occurs if there is a carry out of the high bit. In two’s complement addition, an overflow occurs if there is a carry out of the highest of the magnitude bits, hence a carry into the sign. Adder units commonly handle this by reporting both indications, in one form or another. That information, if desired, is tested in later instructions; it does not affect the addition itself.)

Why do floating-point numbers have signed zeros?

-0 is (generally) treated as 0 *******. It can result when a negative floating-point number is so close to zero that it can be considered 0 (to be clear, I'm referring to arithmetic underflow, and the results of the following computations are interpreted as being exactly ±0, not just really small numbers). e.g.

System.out.println(-1 / Float.POSITIVE_INFINITY);


-0.0

If we consider the same case with a positive number, we will receive our good old 0:

System.out.println(1 / Float.POSITIVE_INFINITY);

0.0

******* Here's a case where using -0.0 results in something different than when using 0.0:

System.out.println(1 / 0.0);
System.out.println(1 / -0.0);


Infinity
-Infinity

This makes sense if we consider the function 1 / x. As x approaches 0 from the +-side, we should get positive infinity, but as it approaches from the --side, we should get negative infinity. The graph of the function should make this clear:

Sample Image

(source)

In math-terms:

Sample Image

This illustrates one significant difference between 0 and -0 in the computational sense.

Here are some relevant resources, some of which have been brought up already. I've included them for the sake of completeness:

Wikipedia article on signed zero
"What Every Computer Scientist Should Know About Floating-Point Arithmetic" (See Signed Zero section)
(PDF) "Much Ado About Nothing's Sign Bit" - an interesting paper by W. Kahan.

What is (+0)+(-0) by IEEE floating point standard?

The IEEE 754 rules of arithmetic for signed zeros state that +0.0 + -0.0 depends on the rounding mode. In the default rounding mode, it will be +0.0. When rounding towards -∞, it will be -0.0.

You can check this in C++ like so:

#include <iostream>

int main() {
    std::cout << "+0.0 + +0.0 == " << +0.0 + +0.0 << std::endl;
    std::cout << "+0.0 + -0.0 == " << +0.0 + -0.0 << std::endl;
    std::cout << "-0.0 + +0.0 == " << -0.0 + +0.0 << std::endl;
    std::cout << "-0.0 + -0.0 == " << -0.0 + -0.0 << std::endl;
    return 0;
}

Output:

+0.0 + +0.0 == 0
+0.0 + -0.0 == 0
-0.0 + +0.0 == 0
-0.0 + -0.0 == -0

How can I differentiate between zero and negative zero?

This is similar to a Codewars Kata that I completed. The trick is to divide by the number that you want to check is negative zero, like this:

int isPositiveZero(double a) {
    return 1/a == 1/0.0;
}

The division gives -infinity if a is negative zero, and infinity if a is zero.

If you want to do it your way, try this:

(*((long *)&a) & 0x8000000000000000)

Note that this violates the strict aliasing rule, which causes undefined behavior. This means that your program could do anything from printing the right result to making your computer sprout wings and fly away.

If you have math.h available and don't have an aversion to using macros from the standard C library, then this solution (using signbit) will also work (and be much more portable):

#include <math.h>

int isPositiveZero(double a) {
    return a == 0.0 && !signbit(a);
}

How can a primitive float value be -0.0? What does that mean?

Because Java uses the IEEE Standard for Floating-Point Arithmetic (IEEE 754) which defines -0.0 and when it should be used.

The smallest number representable has no 1 bit in the subnormal significand and is called the positive or negative zero as determined by the sign. It actually represents a rounding to zero of numbers in the range between zero and the smallest representable non-zero number of the same sign, which is why it has a sign, and why its reciprocal +Inf or -Inf also has a sign.

You can get around your specific problem by adding 0.0

e.g.

Double.toString(value + 0.0);

See: Java Floating-Point Number Intricacies

Operations Involving Negative Zero
...

(-0.0) + 0.0 -> 0.0

"-0.0" is produced when a floating-point operation results in a negative floating-point number so close to 0 that it cannot be represented normally.