Does float have a negative zero? (0f)
According to the standard, negative zero exists but it is equal to positive zero. For almost all purposes, the two behave the same way and many consider the existence of a negative to be an implementation detail. There are, however, some functions that behave quite differently, namely division and atan2
:
#include <math.h>
#include <stdio.h>
int main() {
double x = 0.0;
double y = 0.0;
printf("%.08f == %.08f: %d\n", x, y, x == y);
printf("%.08f == %.08f: %d\n", 1 / x, 1 / y, 1 / x == 1 / y);
printf("%.08f == %.08f: %d\n", atan2(x, y), atan2(y, y), atan2(x, y) == atan2(y, y));
}
The result from this code is:
0.00000000 == 0.00000000: 1
1.#INF0000 == 1.#INF0000: 0
3.14159265 == 3.14159265: 0
This would mean that code would correctly handle certain limits without a need for explicit handling. It's not certain that relying on this feature for values close to the limits is a good idea, since a simple calculation error can change the sign and make the value far from correct, but you can still take advantage of it if you avoid calculations that would change the sign.
In IEEE 754, why does adding negative zero result in a noop but adding positive zero does not?
In the default roundtonearest mode, that most highlevel languages support exclusively, because they do not provide options to disable floatingpoint optimizations that become inapplicable in other modes—I assume that Rust falls in this category—, adding 0.0
happens to have no effect on any floatingpoint value (omitting small details about NaNs), whereas adding +0.0
has an effect on 0.0
(the result of 0.0 + (+0.0)
is +0.0
).
What is negative zero and what is its use?
Each possible floating point valuue actually represents a small range of possible real world numbers (because there are only a finite number of possible floating point numbers but an infinite number of actual values). So 0.0 represents a value anywhere between 0.0 and a very small positive number, whereas 0.0 represents a value anywhere between 0.0 and a very small negative value.
Note however they when we compare 0.0 and 0.0 they are considered to be equal, even though the actual representation in bits is different.
Negative integer zero
Historically, there were integer formats that could represent both −0 and +0. Both signandmagnitude and one’s complement can represent −0 and +0. These proved to be less useful than two’s complement, which won favor and is ubiquitous today.
Two’s complement has some numerical properties that make it a little nicer to implement in hardware, and having two zeros caused some nuisance for programmers. (I heard of bugs such as an account balance being −0 instead of +0 resulting in a person being sent a bill when they should not have been.)
Floatingpoint uses signandmagnitude, so it can represent both −0 and +0. Due to the nature of floatingpoint, the arithmetic properties of two’s complement would not aid a floatingpoint implementation as much, and having two zeros allows a programmer to use a little extra information in some circumstances.
So the choices for integer and floatingpoint formats are motivated by utility, not mathematical necessity.
A Look At Integer Arithmetic
Let’s consider implementing some integer arithmetic in computer hardware using four bits for study. Essentially the first thing we would do is implement unsigned binary arithmetic, so we design some logic gates to make adders and other arithmetic units. So the inputs 0101 and 0011 to the adder produce output 1000.
Next, we want to handle negative numbers. In writing, we handle negative numbers by putting a sign on front, so our first thought might be to do the same thing with bits: Use a bit in front to indicate negative. Now we have a signandmagnitude representation. 0001 represents +1, and 1001 represents −1. 0010 represents +2, and 1010 represents −2. 0111 represents +7 , and 1111 represents −7. And, of course, 0000 represents +0, and 1000 represents −0. That is an idea, and then we must implement it. We have already got an adder, and, if we feed it 0010 (2) and 0011 (3), it correctly outputs 0101 (5). But, if we feed it 0011 (3) and 1001 (−1), it outputs 1100 (−4). So we have to modify it. Well, that is not too bad, we have a subtraction unit for unsigned binary, so we can look at the first bit, and, if we are adding a negative number, we subtract instead of adding. That works for some operations; for 0011 and 1001, observing the leading 1 on the second operand and feeding 011 and 001 to the subtraction unit would produce 010 (2), which is correct. But, if we have 0010 and 1011, feeding 010 and 011 to the subtraction unit might produce some error indication (it was originally designed for unsigned binary) or it might “wrap” and produce 111 (because such wrapping, along with a “borrow out” bit in the output, makes the subtraction unit work as part of a design for subtracting wider numbers). Either way, that is wrong for our signed numbers; we want the output of 0010 (2) plus 1011 (−3) to be 1001 (−1). So we have to design new arithmetic units that handle this. Perhaps, when adding numbers of mixed signs, they figure out which one is larger in magnitude, subtract the smaller from the larger, and then apply the sign bit of the larger. In any case, we have a fair amount of work to do just to design the addition and subtraction units.
Another suggestion is, to make a number negative, invert every bit. This is called one’s complement. It is easy to understand and fits the notion of negation—just negate everything. Let’s consider how it affects our arithmetic units. For the combinations of +3 or −3 with +2 or −2, we would want these results: 0011 (3) + 0010 (2) = 0101 (5), 0011 (3) + 1101 (−2) = 0001 (1), 1100 (−3) + 0010 (2) = 1110 (−1), and 1100 (−3) + 1101 (−2) = 1010 (−5). Upon examination, there is a simple way to adapt our binary adder to make this work: Do the addition on all four bits as if they were unsigned binary, and, if there is a carry out of the leading bit, add it back to the low bit. In unsigned binary 0011 + 0010 = 0101 with no carry, so the final output is 0101. 0011 + 1101 = 0000 with a carry, so the final result is 0001. 1100 + 0010 = 1110 with no carry, so the final result is 1110. 1100 + 1101 = 1001 with a carry, so the final result is 1010.
This is nice; our one’s complement adder is simpler than the signandmagnitude adder. It does not need to compare magnitudes and does not need to do a subtraction to handle negative numbers. We can make it cheaper and make more profit.
Then somebody comes up with the idea of two’s complement. Instead of inverting every bit, we will conceptually subtract the number from 2^{n}, where n is the number of bits. So 10000 − 0001 = 1111 represents −1, and 1110 is −2, 1101 is −3, and so on. What does this do to our adder?
In unsigned binary, 0010 (2) + 1101 (13) = 1111 (15). In two’s complement, 0010 (2) + 1101 (−3) = 1111 (−1). The bits are the same! This actually works for all two’s complement numbers; adding the bit patterns for unsigned numbers produces the same results we want for adding two’s complement numbers. We can use the exact same logic gates for unsigned binary and two’s complement. That is brilliant, give that employee a raise. That is what modern hardware does; the same arithmetic units are used for adding or subtracting two’s complement numbers as are used for adding or subtracting unsigned numbers.
This is a large part of why two’s complement won out for representing negative integers. It results in simpler, easier, cheaper, faster, and more efficient computers.
(There is a difference between unsigned addition and two’s complement addition: How overflow is detected. In unsigned addition, an overflow occurs if there is a carry out of the high bit. In two’s complement addition, an overflow occurs if there is a carry out of the highest of the magnitude bits, hence a carry into the sign. Adder units commonly handle this by reporting both indications, in one form or another. That information, if desired, is tested in later instructions; it does not affect the addition itself.)
Why do floatingpoint numbers have signed zeros?
0
is (generally) treated as 0
*******. It can result when a negative floatingpoint number is so close to zero that it can be considered 0
(to be clear, I'm referring to arithmetic underflow, and the results of the following computations are interpreted as being exactly ±0
, not just really small numbers). e.g.
System.out.println(1 / Float.POSITIVE_INFINITY);
0.0
If we consider the same case with a positive number, we will receive our good old 0
:
System.out.println(1 / Float.POSITIVE_INFINITY);
0.0
******* Here's a case where using 0.0
results in something different than when using 0.0
:
System.out.println(1 / 0.0);
System.out.println(1 / 0.0);
Infinity
Infinity
This makes sense if we consider the function 1 / x
. As x
approaches 0
from the +
side, we should get positive infinity, but as it approaches from the 
side, we should get negative infinity. The graph of the function should make this clear:
(source)
In mathterms:
This illustrates one significant difference between 0
and 0
in the computational sense.
Here are some relevant resources, some of which have been brought up already. I've included them for the sake of completeness:
 Wikipedia article on signed zero
 "What Every Computer Scientist Should Know About FloatingPoint Arithmetic" (See Signed Zero section)
 (PDF) "Much Ado About Nothing's Sign Bit"  an interesting paper by W. Kahan.
What is (+0)+(0) by IEEE floating point standard?
The IEEE 754 rules of arithmetic for signed zeros state that +0.0 + 0.0
depends on the rounding mode. In the default rounding mode, it will be +0.0
. When rounding towards ∞, it will be 0.0
.
You can check this in C++ like so:
#include <iostream>
int main() {
std::cout << "+0.0 + +0.0 == " << +0.0 + +0.0 << std::endl;
std::cout << "+0.0 + 0.0 == " << +0.0 + 0.0 << std::endl;
std::cout << "0.0 + +0.0 == " << 0.0 + +0.0 << std::endl;
std::cout << "0.0 + 0.0 == " << 0.0 + 0.0 << std::endl;
return 0;
}
Output:
+0.0 + +0.0 == 0
+0.0 + 0.0 == 0
0.0 + +0.0 == 0
0.0 + 0.0 == 0
How can I differentiate between zero and negative zero?
This is similar to a Codewars Kata that I completed. The trick is to divide by the number that you want to check is negative zero, like this:
int isPositiveZero(double a) {
return 1/a == 1/0.0;
}
The division gives infinity
if a
is negative zero, and infinity
if a
is zero.
If you want to do it your way, try this:
(*((long *)&a) & 0x8000000000000000)
Note that this violates the strict aliasing rule, which causes undefined behavior. This means that your program could do anything from printing the right result to making your computer sprout wings and fly away.
If you have math.h
available and don't have an aversion to using macros from the standard C library, then this solution (using signbit
) will also work (and be much more portable):
#include <math.h>
int isPositiveZero(double a) {
return a == 0.0 && !signbit(a);
}
How can a primitive float value be 0.0? What does that mean?
Because Java uses the IEEE Standard for FloatingPoint Arithmetic (IEEE 754) which defines 0.0
and when it should be used.
The smallest number representable has no 1 bit in the subnormal significand and is called the positive or negative zero as determined by the sign. It actually represents a rounding to zero of numbers in the range between zero and the smallest representable nonzero number of the same sign, which is why it has a sign, and why its reciprocal +Inf or Inf also has a sign.
You can get around your specific problem by adding 0.0
e.g.
Double.toString(value + 0.0);
See: Java FloatingPoint Number Intricacies
Operations Involving Negative Zero
...
(0.0) + 0.0 > 0.0

"0.0" is produced when a floatingpoint operation results in a negative floatingpoint number so close to 0 that it cannot be represented normally.
Related Topics
How to Use NonDefault Delimiters When Reading a Text File with Std::Fstream
Decimal to Hex Conversion C++ BuiltIn Function
On How to Recognize Rvalue or Lvalue Reference and IfItHasAName Rule
Profiler for Visual Studio 2008, C++
Clang Doesn't See Basic Headers
What's the Point of a Final Virtual Function
Why Is Std::Iterator Deprecated
Initializing Std::String from Char* Without Copy
C++ Std::Accumulate Doesn't Give the Expected Sum
Warning: Returning Reference to Temporary
What's the Difference Between Opening a File with iOS::Binary or iOS::Out or Both
How to Implement Timeout for Function in C++
How to Build Google's Protobuf in Windows Using Mingw
Command Working in Terminal, But Not via Qprocess
How to Generate Random Numbers in C++
Why Does Wide FileStream in C++ Narrow Written Data by Default