Dealing with TRUE, FALSE, NA and NaN
To answer your questions in order:
1) The ==
operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA
function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!
How to process NA as False in R
For me, I'd think the most beneficial way would be to use a dplyr
's case_when
function and explicitly state how the NA
cases you mention should be handled.
Replicating your example (notice that I'm explicitly setting the NAs here. Your NAs were the result of R not being able to handle a character string ("NA") within a numeric vector.
col1 = as.numeric(c(10, 2, 15, 2, NA_real_, 15))
col2 = as.numeric(c(15, 15, 2, 2, 15, NA_real_))
test <- data.frame(col1, col2)
For both the mutate
function and case_when
function I'm loading dplyr
. If you're not familiar with case_when
it's like a ifelse with multiple conditionals. Each conditional is followed by a "~" tilde. What comes after the tilde is what gets assigned if the conditional is met. To set "everything else" as some value X you type TRUE ~ "x"
as that obviously gets evaluated as true for all the other cases that have not been met in the previous conditionals.
This should do what you want:
library(dplyr)
test <- mutate(.data = test,
G5 = case_when(col1 > 5 & col2 > 5 ~ "Yes", #Original
(is.na(col1) & col2 > 5) | (col1 > 5 & is.na(col2)) ~ "Yes",
TRUE ~ "No")) # Everything else gets the value "No"
test
#> col1 col2 G5
#> 1 10 15 Yes
#> 2 2 15 No
#> 3 15 2 No
#> 4 2 2 No
#> 5 NA 15 Yes
#> 6 15 NA Yes
Is NaN falsy? Why NaN === false returns false
- Falsy and being strictly equal to
false
are very different things, that's why one has ay
instead of ane
. ;) NaN
is spec'd to never be equal to anything. The second part of your question is comparingfalse === false
, which is funnily enough,true
:)
If you really want to know if something is NaN
, you can use Object.is()
. Running Object.is(NaN, NaN)
returns true
.
Preserve NaN values in pandas boolean comparisons
Let's use np.logical_and
:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan],
'B':[True, False, True, np.nan, np.nan, False]})
s = np.logical_and(df['A'],df['B'])
print(s)
Output:
0 True
1 False
2 False
3 NaN
4 NaN
5 False
Name: A, dtype: object
Why do Not a Number values equal True when cast as boolean in Python/Numpy?
This is in no way NumPy-specific, but is consistent with how Python treats NaNs:
In [1]: bool(float('nan'))
Out[1]: True
The rules are spelled out in the documentation.
I think it could be reasonably argued that the truth value of NaN should be False. However, this is not how the language works right now.
Why is NAN unequal to everything except true, in PHP?
NAN (quite NAN or signaling NAN) is a non-zero floating point value.
* That is why *
sqrt(-1.0) -> NAN
There is -NAN and +NAN although since about 80286, it is just usually recognized as NAN on test.
Check your FPU floating point instruction set if you need to.
+INF and -INF are also non-zero floating point values:
- log(0.0) -> +INF
log(0.0) -> -INF
Here is a dump of the Intel floating point stack. I'll just list the few values I was talking about: (don't forget, internally, FPU is 10 bytes):
<exp> <mantissa>
0.0 00 00 00 00 00 00 00 00 00 00
-INF FF FF 80 00 00 00 00 00 00 00
+INF 7F FF 80 00 00 00 00 00 00 00
-NAN FF FF C0 00 00 00 00 00 00 00
+NAN 7F FF C0 00 00 00 00 00 00 00
So as you can see, only 0.0 is ZERO!
Related Topics
Split Up a Dataframe by Number of Rows
Rstudio Rmarkdown: Both Portrait and Landscape Layout in a Single PDF
Create Categories by Comparing a Numeric Column with a Fixed Value
R: Data.Table Cross-Join Not Working
Extract Every Nth Element of a Vector
Why Is Allow.Cartesian Required at Times When When Joining Data.Tables with Duplicate Keys
Remove Everything After Space in String
Overlay Data Onto Background Image
Assign Intermediate Output to Temp Variable as Part of Dplyr Pipeline
How to Find All Functions in an R Package
Convert Column Classes in Data.Table
Perform a Semi-Join with Data.Table
Mean of Each Element of a List of Matrices
How to Connect Two Coordinates with a Line Using Leaflet in R