Numeric comparison difficulty in R
I've never been a fan of all.equal
for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y
then x-y == 0
. Perhaps x-y
is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal
and this is more compact and straightforward than all.equal
.
Using R: How to compare two numbers with floating-point issues related to the options(digits=n) and how the numbers were introduced?
all.equal
might be useful here:
n1 <- 0.14999999999999999
n2 <- .15
n3 <- 0.15000000000000002
all.equal(n1,n2)
# [1] TRUE
all.equal(n1,n3)
# [1] TRUE
You can manually specify a tolerance if you like, e.g.,
all.equal(n1, n3, tolerance = 1.5e-16)
# [1] "Mean relative difference: 1.850372e-16"
Finally, as the help page for all.equal
says, if you need a bool returned, wrap it in isTRUE(all.equal(...))
or identical
.
Less or equal for floats in R
Interesting question. I am sure there are better ways, but this simple function takes two vectors of double
s and returns if they are nearly equal element-wise (mode = "ae"
), given the specified tolerance. It also can return if they are less than (mode = "lt"
) or if they are nearly equal or less than (mode = "ne.lt"
), along with their "gt"
equivalents...
near_equal <- function( x , y , tol = 1.5e-8 , mode = "ae" ){
ae <- mapply( function(x,y) isTRUE( all.equal( x , y , tolerance = tol ) ) , x , y )
gt <- x > y
lt <- x < y
if( mode == "ae" )
return( ae )
if( mode == "gt" )
return( gt )
if( mode == "lt" )
return( lt )
if( mode == "ne.gt" )
return( ae | gt )
if( mode == "ne.lt" )
return( ae | lt )
}
# And in action....
set.seed(1)
x <- 1:5
# [1] 1 2 3 4 5
y <- 1:5 + rnorm(5,sd=0.1)
# [1] 0.9373546 2.0183643 2.9164371 4.1595281 5.0329508
near_equal( x , y , tol = 0.05 , mode = "ae" )
#[1] FALSE TRUE TRUE TRUE TRUE
near_equal( x , y , tol = 0.05 , mode = "ne.gt" )
#[1] TRUE TRUE TRUE TRUE TRUE
Hope that helps.
First circle of R hell. 0.1 != 0.3/3
See these questions:
- In R, what is the difference between these two?
- Numeric comparison difficulty in R
Generally speaking, you can deal with this by including a tolerance level as per the second link above.
Rounding issue in all.equal
This has to do with floating point accuracy. The manual isn't entirely clear at first glance, but in your example the mean absolute difference
of 2-1.981
is 0.019
which is >
0.01
, the tolerance
. scale
is also NULL
. Therefore the comparison made is the relative difference scaled by the mean absolute difference. Eh?!
Using tolerance
implies that you care about the magnitude of the numbers involved. Relative difference accounts for not how big the difference is (absolute terms), but how great it is, relative to the numbers being compared. Given the example in the link, the difference between 5 and 6 is more significant (I use the term loosely) than between 1,000,000,000
and 1,000,000,001
.
So if the relative difference between the two numbers is less than tolerance
the numbers are considered equal. For two single numbers (as in this example) the relative difference is given by:
( current - target ) / current
Which is
( 2 - 1.981 ) / 2 == 0.0095
The tolerance you specified is 0.01
therefore the numbers are considered equal because the relative difference is less than this. The difference between these numbers ±
the relative difference also just happens to be the smallest representable floating point number!
identical( abs( ( 2 - 0.0095 ) - ( 1.981 + 0.0095 ) ) , .Machine$double.eps )
[1] TRUE
Now try:
all.equal( 2 , 1.981 , 0.00949999999999 )
[1] "Mean relative difference: 0.0095"
Related Topics
Delete Rows That Exist in Another Data Frame
Selecting Multiple Odd or Even Columns/Rows for Dataframe
R Collapse Multiple Rows into 1 Row - Same Columns
Divide All Columns by the Value from the 2Nd Column - Apply for All Rows
R: Pulling Data from One Column to Create New Columns
If Else Statements to Check If a String Contains a Substring in R
Concatenating Two Text Columns in Dplyr
Adding Value from One Data.Frame to Another Data.Frame by Matching a Variable
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)
How to Arrange a Variable List of Plots Using Grid.Arrange
Case Statement Equivalent in R
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%
Select the Top N Values by Group