Why does one 2 equal FALSE in R?
From help("<")
:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
So in this case, the numeric is of lower precedence than the character. So 2
is coerced to the character "2"
. Comparison of strings in character vectors is lexicographic which, as I understand it, is alphabetic but locale-dependent.
Why does as.numeric(1) == (3 | 4) evaluate to TRUE?
1 == 2 | 4
Operator precedence tells us it is equivalent to (1 == 2) | 4
1 == 2
is FALSE
, 4 is coerced to logical (because |
is a logical operator), as.logical(4)
is TRUE
, so you have FALSE | TRUE
, that's TRUE
Indeed coercion rules for logical operators (?Logic
) tell us that:
Numeric and complex vectors will be coerced to logical values, with
zero being false and all non-zero values being true.
3 == 2 | 4
Same thing
1 == (2 | 4)
2 | 4
will be coerced to TRUE | TRUE
, which is TRUE
. Then 1 == TRUE
is coerced to 1 == 1
which is TRUE
.
Indeed coercion rules for comparison operators (?Comparison
) tell us that:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
as.numeric(1) == (2 | 4)
Same thing
1L == (2 | 4)
Same again
1 is equal to 2 or 4
is actually (1 is equal to 2) or (1 is equal to 4), which is:
(1==2)|(1==4)
which is
FALSE | FALSE
which is FALSE
How come as.character(1) == as.numeric(1) is TRUE?
According to ?==
For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. S
In another paragraph, it is also written
x, y
atomic vectors, symbols, calls, or other objects for which methods have been written. If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
identical(as.character(1), as.numeric(1))
#[1] FALSE
Why do I get TRUE when checking if a character value is greater than a number?
The hierarchy for coercion is: logical < integer < numeric < character. So in both cases, the numeric is coerced to character. Characters get "sorted" position by position in ASCII order. So "9"
is greater than "2"
but "10"
is less than "2"
because "1"
is less than "2"
.
Why is the expression 1 ==1 evaluating to TRUE?
From the help("==")
:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
So 1
should be converted to "1"
.
R: why is identical(c(1:3), c(1, 2, 3)) false?
R> class(1:3)
[1] "integer"
R> class(c(1,2,3))
[1] "numeric"
R>
In a nutshell, :
as the sequence operator returns integer "because that is what folks really want".
Hence:
R> identical(1:3, c(1L,2L,3L))
[1] TRUE
R> identical(1*(1:3), c(1,2,3))
[1] TRUE
R>
Why TRUE == TRUE is TRUE in R?
According to the help file ?`==`
:
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
So TRUE
is coerced to "TRUE"
(i. e. as.character(TRUE)
), hence the equality.
The equivalent of an operator ===
in some other language (i. e. are the two objects equal and of the same type) would be function identical
:
identical(TRUE, "TRUE")
[1] FALSE
Why are these numbers not equal?
General (language agnostic) reason
Since not all numbers can be represented exactly in IEEE floating point arithmetic (the standard that almost all computers use to represent decimal numbers and do math with them), you will not always get what you expected. This is especially true because some values which are simple, finite decimals (such as 0.1 and 0.05) are not represented exactly in the computer and so the results of arithmetic on them may not give a result that is identical to a direct representation of the "known" answer.
This is a well known limitation of computer arithmetic and is discussed in several places:
- The R FAQ has question devoted to it: R FAQ 7.31
- The R Inferno by Patrick Burns devotes the first "Circle" to this problem (starting on page 9)
- David Goldberg, "What Every Computer Scientist Should Know About Floating-point Arithmetic," ACM Computing Surveys 23, 1 (1991-03), 5-48 doi>10.1145/103162.103163 (revision also available)
- The Floating-Point Guide - What Every Programmer Should Know About Floating-Point Arithmetic
- 0.30000000000000004.com compares floating point arithmetic across programming languages
- Several Stack Overflow questions including
- Why are floating point numbers inaccurate?
- Why can't decimal numbers be represented exactly in binary?
- Is floating point math broken?
- Canonical duplicate for "floating point is inaccurate" (a meta discussion about a canonical answer for this issue)
Comparing scalars
The standard solution to this in R
is not to use ==
, but rather the all.equal
function. Or rather, since all.equal
gives lots of detail about the differences if there are any, isTRUE(all.equal(...))
.
if(isTRUE(all.equal(i,0.15))) cat("i equals 0.15") else cat("i does not equal 0.15")
yields
i equals 0.15
Some more examples of using all.equal
instead of ==
(the last example is supposed to show that this will correctly show differences).
0.1+0.05==0.15
#[1] FALSE
isTRUE(all.equal(0.1+0.05, 0.15))
#[1] TRUE
1-0.1-0.1-0.1==0.7
#[1] FALSE
isTRUE(all.equal(1-0.1-0.1-0.1, 0.7))
#[1] TRUE
0.3/0.1 == 3
#[1] FALSE
isTRUE(all.equal(0.3/0.1, 3))
#[1] TRUE
0.1+0.1==0.15
#[1] FALSE
isTRUE(all.equal(0.1+0.1, 0.15))
#[1] FALSE
Some more detail, directly copied from an answer to a similar question:
The problem you have encountered is that floating point cannot represent decimal fractions exactly in most cases, which means you will frequently find that exact matches fail.
while R lies slightly when you say:
1.1-0.2
#[1] 0.9
0.9
#[1] 0.9
You can find out what it really thinks in decimal:
sprintf("%.54f",1.1-0.2)
#[1] "0.900000000000000133226762955018784850835800170898437500"
sprintf("%.54f",0.9)
#[1] "0.900000000000000022204460492503130808472633361816406250"
You can see these numbers are different, but the representation is a bit unwieldy. If we look at them in binary (well, hex, which is equivalent) we get a clearer picture:
sprintf("%a",0.9)
#[1] "0x1.ccccccccccccdp-1"
sprintf("%a",1.1-0.2)
#[1] "0x1.ccccccccccccep-1"
sprintf("%a",1.1-0.2-0.9)
#[1] "0x1p-53"
You can see that they differ by 2^-53
, which is important because this number is the smallest representable difference between two numbers whose value is close to 1, as this is.
We can find out for any given computer what this smallest representable number is by looking in R's machine field:
?.Machine
#....
#double.eps the smallest positive floating-point number x
#such that 1 + x != 1. It equals base^ulp.digits if either
#base is 2 or rounding is 0; otherwise, it is
#(base^ulp.digits) / 2. Normally 2.220446e-16.
#....
.Machine$double.eps
#[1] 2.220446e-16
sprintf("%a",.Machine$double.eps)
#[1] "0x1p-52"
You can use this fact to create a 'nearly equals' function which checks that the difference is close to the smallest representable number in floating point. In fact this already exists: all.equal
.
?all.equal
#....
#all.equal(x,y) is a utility to compare R objects x and y testing ‘near equality’.
#....
#all.equal(target, current,
# tolerance = .Machine$double.eps ^ 0.5,
# scale = NULL, check.attributes = TRUE, ...)
#....
So the all.equal function is actually checking that the difference between the numbers is the square root of the smallest difference between two mantissas.
This algorithm goes a bit funny near extremely small numbers called denormals, but you don't need to worry about that.
Comparing vectors
The above discussion assumed a comparison of two single values. In R, there are no scalars, just vectors and implicit vectorization is a strength of the language. For comparing the value of vectors element-wise, the previous principles hold, but the implementation is slightly different. ==
is vectorized (does an element-wise comparison) while all.equal
compares the whole vectors as a single entity.
Using the previous examples
a <- c(0.1+0.05, 1-0.1-0.1-0.1, 0.3/0.1, 0.1+0.1)
b <- c(0.15, 0.7, 3, 0.15)
==
does not give the "expected" result and all.equal
does not perform element-wise
a==b
#[1] FALSE FALSE FALSE FALSE
all.equal(a,b)
#[1] "Mean relative difference: 0.01234568"
isTRUE(all.equal(a,b))
#[1] FALSE
Rather, a version which loops over the two vectors must be used
mapply(function(x, y) {isTRUE(all.equal(x, y))}, a, b)
#[1] TRUE TRUE TRUE FALSE
If a functional version of this is desired, it can be written
elementwise.all.equal <- Vectorize(function(x, y) {isTRUE(all.equal(x, y))})
which can be called as just
elementwise.all.equal(a, b)
#[1] TRUE TRUE TRUE FALSE
Alternatively, instead of wrapping all.equal
in even more function calls, you can just replicate the relevant internals of all.equal.numeric
and use implicit vectorization:
tolerance = .Machine$double.eps^0.5
# this is the default tolerance used in all.equal,
# but you can pick a different tolerance to match your needs
abs(a - b) < tolerance
#[1] TRUE TRUE TRUE FALSE
This is the approach taken by dplyr::near
, which documents itself as
This is a safe way of comparing if two vectors of floating point numbers are (pairwise) equal. This is safer than using
==
, because it has a built in tolerance
dplyr::near(a, b)
#[1] TRUE TRUE TRUE FALSE
Testing for occurrence of a value within a vector
The standard R function %in%
can also suffer from the same issue if applied to floating point values. For example:
x = seq(0.85, 0.95, 0.01)
# [1] 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95
0.92 %in% x
# [1] FALSE
We can define a new infix operator to allow for a tolerance in the comparison as follows:
`%.in%` = function(a, b, eps = sqrt(.Machine$double.eps)) {
any(abs(b-a) <= eps)
}
0.92 %.in% x
# [1] TRUE
TRUE FALSE identical comparison indicator wrong in R
Your problem is that
sapply(df$TotalAnimals, identical, df$TotalFemales+df$TotalMales)
does not match TotalAnimals
with TotalFemales+TotalMales
element-by-element; rather, it takes each element of TotalAnimals
and compares it to the entire TotalFemales+TotalMales
vector ... i.e., it does the equivalent of
identical(df$TotalAnimals[1],df$TotalFemales+df$TotalMales)
identical(df$TotalAnimals[2],df$TotalFemales+df$TotalMales)
...
Each of these comparisons gives FALSE
because it is comparing a length-1 numeric vector to a length-N numeric vector (where N is the number of rows of df
).
with(df,identical(TotalAnimals, TotalFemales+TotalMales))
should work fine. Another alternative, if you don't need to worry about NA
values, is
with(df,TotalAnimals==TotalFemales+TotalMales)
doing it this way (vectorized element-by-element) will help if you want to check which elements differ ...
(I would typically include the line
stopifnot(identical(df$TotalAnimals,df$TotalFemales+df$TotalMales))
in my code to stop with an error if there's a problem.)
Related Topics
How to Show a Legend on Dual Y-Axis Ggplot
How to Subset Data in R Without Losing Na Rows
Create a Matrix of Scatterplots (Pairs() Equivalent) in Ggplot2
What Do the %Op% Operators in Mean? for Example "%In%"
Unique() for More Than One Variable
How to Insert an Image into the Navbar on a Shiny Navbarpage()
Displaying a PDF from a Local Drive in Shiny
Stacked Barplot with Colour Gradients for Each Bar
Convert String to Date, Format: "Dd.Mm.Yyyy"
Python's Xrange Alternative for R or How to Loop Over Large Dataset Lazilly
What Are the Differences Between Community Detection Algorithms in Igraph
Setting Document Title in Rmarkdown from Parameters
How to Aggregate a Dataframe by Week
Control Point Border Thickness in Ggplot
R Displays Numbers in Scientific Notation
How to Speed Up Subset by Groups
Different Colours of Geom_Line Above and Below a Specific Value