Why are these numbers not equal?
General (language agnostic) reason
Since not all numbers can be represented exactly in IEEE floating point arithmetic (the standard that almost all computers use to represent decimal numbers and do math with them), you will not always get what you expected. This is especially true because some values which are simple, finite decimals (such as 0.1 and 0.05) are not represented exactly in the computer and so the results of arithmetic on them may not give a result that is identical to a direct representation of the "known" answer.
This is a well known limitation of computer arithmetic and is discussed in several places:
- The R FAQ has question devoted to it: R FAQ 7.31
- The R Inferno by Patrick Burns devotes the first "Circle" to this problem (starting on page 9)
- David Goldberg, "What Every Computer Scientist Should Know About Floating-point Arithmetic," ACM Computing Surveys 23, 1 (1991-03), 5-48 doi>10.1145/103162.103163 (revision also available)
- The Floating-Point Guide - What Every Programmer Should Know About Floating-Point Arithmetic
- 0.30000000000000004.com compares floating point arithmetic across programming languages
- Several Stack Overflow questions including
- Why are floating point numbers inaccurate?
- Why can't decimal numbers be represented exactly in binary?
- Is floating point math broken?
- Canonical duplicate for "floating point is inaccurate" (a meta discussion about a canonical answer for this issue)
Comparing scalars
The standard solution to this in R
is not to use ==
, but rather the all.equal
function. Or rather, since all.equal
gives lots of detail about the differences if there are any, isTRUE(all.equal(...))
.
if(isTRUE(all.equal(i,0.15))) cat("i equals 0.15") else cat("i does not equal 0.15")
yields
i equals 0.15
Some more examples of using all.equal
instead of ==
(the last example is supposed to show that this will correctly show differences).
0.1+0.05==0.15
#[1] FALSE
isTRUE(all.equal(0.1+0.05, 0.15))
#[1] TRUE
1-0.1-0.1-0.1==0.7
#[1] FALSE
isTRUE(all.equal(1-0.1-0.1-0.1, 0.7))
#[1] TRUE
0.3/0.1 == 3
#[1] FALSE
isTRUE(all.equal(0.3/0.1, 3))
#[1] TRUE
0.1+0.1==0.15
#[1] FALSE
isTRUE(all.equal(0.1+0.1, 0.15))
#[1] FALSE
Some more detail, directly copied from an answer to a similar question:
The problem you have encountered is that floating point cannot represent decimal fractions exactly in most cases, which means you will frequently find that exact matches fail.
while R lies slightly when you say:
1.1-0.2
#[1] 0.9
0.9
#[1] 0.9
You can find out what it really thinks in decimal:
sprintf("%.54f",1.1-0.2)
#[1] "0.900000000000000133226762955018784850835800170898437500"
sprintf("%.54f",0.9)
#[1] "0.900000000000000022204460492503130808472633361816406250"
You can see these numbers are different, but the representation is a bit unwieldy. If we look at them in binary (well, hex, which is equivalent) we get a clearer picture:
sprintf("%a",0.9)
#[1] "0x1.ccccccccccccdp-1"
sprintf("%a",1.1-0.2)
#[1] "0x1.ccccccccccccep-1"
sprintf("%a",1.1-0.2-0.9)
#[1] "0x1p-53"
You can see that they differ by 2^-53
, which is important because this number is the smallest representable difference between two numbers whose value is close to 1, as this is.
We can find out for any given computer what this smallest representable number is by looking in R's machine field:
?.Machine
#....
#double.eps the smallest positive floating-point number x
#such that 1 + x != 1. It equals base^ulp.digits if either
#base is 2 or rounding is 0; otherwise, it is
#(base^ulp.digits) / 2. Normally 2.220446e-16.
#....
.Machine$double.eps
#[1] 2.220446e-16
sprintf("%a",.Machine$double.eps)
#[1] "0x1p-52"
You can use this fact to create a 'nearly equals' function which checks that the difference is close to the smallest representable number in floating point. In fact this already exists: all.equal
.
?all.equal
#....
#all.equal(x,y) is a utility to compare R objects x and y testing ‘near equality’.
#....
#all.equal(target, current,
# tolerance = .Machine$double.eps ^ 0.5,
# scale = NULL, check.attributes = TRUE, ...)
#....
So the all.equal function is actually checking that the difference between the numbers is the square root of the smallest difference between two mantissas.
This algorithm goes a bit funny near extremely small numbers called denormals, but you don't need to worry about that.
Comparing vectors
The above discussion assumed a comparison of two single values. In R, there are no scalars, just vectors and implicit vectorization is a strength of the language. For comparing the value of vectors element-wise, the previous principles hold, but the implementation is slightly different. ==
is vectorized (does an element-wise comparison) while all.equal
compares the whole vectors as a single entity.
Using the previous examples
a <- c(0.1+0.05, 1-0.1-0.1-0.1, 0.3/0.1, 0.1+0.1)
b <- c(0.15, 0.7, 3, 0.15)
==
does not give the "expected" result and all.equal
does not perform element-wise
a==b
#[1] FALSE FALSE FALSE FALSE
all.equal(a,b)
#[1] "Mean relative difference: 0.01234568"
isTRUE(all.equal(a,b))
#[1] FALSE
Rather, a version which loops over the two vectors must be used
mapply(function(x, y) {isTRUE(all.equal(x, y))}, a, b)
#[1] TRUE TRUE TRUE FALSE
If a functional version of this is desired, it can be written
elementwise.all.equal <- Vectorize(function(x, y) {isTRUE(all.equal(x, y))})
which can be called as just
elementwise.all.equal(a, b)
#[1] TRUE TRUE TRUE FALSE
Alternatively, instead of wrapping all.equal
in even more function calls, you can just replicate the relevant internals of all.equal.numeric
and use implicit vectorization:
tolerance = .Machine$double.eps^0.5
# this is the default tolerance used in all.equal,
# but you can pick a different tolerance to match your needs
abs(a - b) < tolerance
#[1] TRUE TRUE TRUE FALSE
This is the approach taken by dplyr::near
, which documents itself as
This is a safe way of comparing if two vectors of floating point numbers are (pairwise) equal. This is safer than using
==
, because it has a built in tolerance
dplyr::near(a, b)
#[1] TRUE TRUE TRUE FALSE
Testing for occurrence of a value within a vector
The standard R function %in%
can also suffer from the same issue if applied to floating point values. For example:
x = seq(0.85, 0.95, 0.01)
# [1] 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95
0.92 %in% x
# [1] FALSE
We can define a new infix operator to allow for a tolerance in the comparison as follows:
`%.in%` = function(a, b, eps = sqrt(.Machine$double.eps)) {
any(abs(b-a) <= eps)
}
0.92 %.in% x
# [1] TRUE
Why are these 2 big Float values not equal?
value1
and value3
are pointers, so value1 == value3
compares those pointers, not the pointed values. It may be possible 2 pointed objects are equal yet their addresses are not.
To compare big.Float
values (or *big.Float
), use the Float.Cmp()
method. It returns 0
if the 2 values (the numbers they represent) are equal.
if value1.Cmp(value3) == 0 {
fmt.Println("values are equal", value1, value3)
} else {
fmt.Println("values are not equal", value1, value3)
}
With this change output will be (try it on the Go Playground):
values are equal 1.3719531185501585e+11 1.3719531185501585e+11
difference is here:
[1 2 0 0 0 53 0 0 0 37 255 139 210 151 120 32 120 0]
[1 10 0 0 0 53 0 0 0 37 255 139 210 151 120 32 120 0]
So the represented numbers are equal.
The serialized binary form returned by Float.GobEncode()
is not the same, but that doesn't mean the represented numbers are not equal. As its documentation states:
GobEncode implements the gob.GobEncoder interface. The Float value and all its attributes (precision, rounding mode, accuracy) are marshaled.
The output is different because internals of big.Float
are not the same (in this case the Accuracy). In this case even if you could compare the pointed objects, those would not be the same, but the represented numbers are. Again, always use the provided methods to compare complex objects, and certainly not the addresses.
The difference in this example comes from the stored accuracy field:
fmt.Println(value1.Acc())
fmt.Println(value3.Acc())
Which outputs (try it on the Go Playground):
Below
Exact
The accuracy returned by Float.Acc()
is the "accuracy of x produced by the most recent operation". Since the last operation performed on value1
and value3
are not the same (value1.Sub()
and value3.Set()
), the accuracy field is not necessarily the same (and in this example they do differ). And since the accuracy property is also included in the Gob serialized form, that's why their serialized forms are different.
Equal numbers shown as false when compared in R
Because they aren't exactly the same number. They differ by a very small amount due to the computer's representation of numbers, also known as floating point errors:
> a - b
[1] -8.881784e-16
Jon Skeet has an excellent blog post on this issue, which pops up on Stack Overflow with some regularity.
As @mrdwab suggests in the comments, you should use all.equal(a, b)
to test for near equality.
Ceiling function apparently rounds incorrectly
@rawr links to Why are these numbers not equal? , which explains the fundamental issues with floating-point computation. In this case:
print((1-0.995) - 0.005)
## [1] 4.336809e-18
Because of this, (1-0.005)*5e4
is slightly greater than 250 (to see this you have to print((1-0.005)*5e4, digits=22)
, because R prints a rounded representation by default) so ceiling()
pushes the answer up to 251.
In this particular case it looks like you can get the desired answer by rounding (1-p)
to three decimal places, ceiling(N*round(1-p,3))
— but you should definitely read the linked answer and think about whether this solution will be robust for all of your needs.
Related Topics
Calculate Max Value Across Multiple Columns by Multiple Groups
R: Error in Usemethod("Tbl_Vars")
How to Make a List of Data Frames
Find Complement of a Data Frame (Anti - Join)
How to Read Data When Some Numbers Contain Commas as Thousand Separator
Predict() - Maybe I'M Not Understanding It
Data.Table Objects Assigned With := from Within Function Not Printed
Select Groups Based on Number of Unique/Distinct Values
Force R Not to Use Exponential Notation (E.G. E+10)
R: Pulling Data from One Column to Create New Columns
Regex to Replace Comma to Dot Separator
Split Comma-Separated Strings in a Column into Separate Rows
Combine a List of Data Frames into One Data Frame by Row
What Specifically Are the Dangers of Eval(Parse(...))
How to Create a Lag Variable Within Each Group