How to include NA in ifelse?
You can't really compare NA
with another value, so using ==
would not work. Consider the following:
NA == NA
# [1] NA
You can just change your comparison from ==
to %in%
:
ifelse(is.na(test$time) | test$type %in% "A", NA, "1")
# [1] NA "1" NA "1"
Regarding your other question,
I could get this to work with my existing code if I could somehow change the result of
is.na(test$type)
to returnFALSE
instead ofTRUE
, but I'm not sure how to do that.
just use !
to negate the results:
!is.na(test$time)
# [1] TRUE TRUE FALSE TRUE
Ifelse statement order with is.na using R Dplyr Mutate
When comparing with ==
NA
values return NA
. When the first statement returns an NA
value it doesn't go and check the next ifelse
statement. To go to the next ifelse
statement it needs a FALSE
value.
p1$Value == 1
#[1] TRUE TRUE FALSE NA NA
A workaround would be to use %in%
instead of ==
which returns FALSE
for NA
values.
p1$Value %in% 1
#[1] TRUE TRUE FALSE FALSE FALSE
library(dplyr)
p1 %>% mutate(NewCol = ifelse(Value %in% 1, "Test1Yes",
ifelse(is.na(Value), "TestYes",
ifelse(Value %in% 0, "Test0Yes","No"))))
# col1 Value NewCol
#1 var1 1 Test1Yes
#2 var2 1 Test1Yes
#3 var3 0 Test0Yes
#4 var4 NA TestYes
#5 var5 NA TestYes
You can also get the desired behaviour using case_when
statement instead of nested ifelse
.
p1 %>%
mutate(NewCol = case_when(Value == 1 ~ "Test1Yes",
is.na(Value) ~ "TestYes",
Value == 0 ~ "Test0Yes",
TRUE ~ "No"))
# col1 Value NewCol
#1 var1 1 Test1Yes
#2 var2 1 Test1Yes
#3 var3 0 Test0Yes
#4 var4 NA TestYes
#5 var5 NA TestYes
How to handle NAs in ifelse when creating new column
Just to explain why your version does not work: NA == NA
is not TRUE
, it's NA
- conceptually this makes sense, usually we want to know if two values are the same, and if we don't know one or both of them, we don't know of they are the same or not. To test if a value is NA
you need to use the function is.NA()
. Here's a simple version:
df_addvar3 <- df %>%
mutate(var3 = ifelse(is.na(var1), var2, var1))
Your question was not quite clear what you want to happen if the values are different from -1:1, or if var1 and var2 are both not NA, but different from one another. All of these should be relatively simple to add if necessary.
R handling NA values while doing a comparison ifelse
Please see the following SO post: How to ignore NA in ifelse statement
With respect to your question:
df$counting <- ifelse(df$age > 5 & df$age < 8 & !is.na(df$age), 1, 0) + ifelse(df$marks > 60 & df$marks < 70, 1, 0)
> df
sex occupation age marks counting
1 M Student NA 34 0
2 F Analyst 6 65 2
3 M Analyst 9 21 0
How to ignore NA in ifelse statement
This syntax is easier to read:
x <- c(NA, 1, 0, -1)
(x > 0) & (!is.na(x))
# [1] FALSE TRUE FALSE FALSE
(The outer parentheses aren't necessary, but will make the statement easier to read for almost anyone other than the machine.)
Edit:
## If you want 0s and 1s
((x > 0) & (!is.na(x))) * 1
# [1] 0 1 0 0
Finally, you can make the whole thing into a function:
isPos <- function(x) {
(x > 0) & (!is.na(x)) * 1
}
isPos(x)
# [1] 0 1 0 0
Direct way of telling ifelse to ignore NA
You can use %in%
instead of ==
to sort-of ignore NA
s.
ifelse(df$a %in% 1, "a==1",
ifelse(df$b %in% 1, "b==1",
ifelse(df$c %in% 1, "c==1", NA)))
Unfortunately, this does not give any performance gain compared to the original while @arkun's solution is about 3 times faster.
solution_original <- function(){
ifelse(df$a==1 & !is.na(df$a), "a==1",
ifelse(df$b==1 & !is.na(df$b), "b==1",
ifelse(df$c==1 & !is.na(df$c), "c==1", NA)))
}
solution_akrun <- function(){
v1 <- names(df)[max.col(!is.na(df)) * NA^!rowSums(!is.na(df))]
i1 <- !is.na(v1)
v1[i1] <- paste0(v1[i1], "==1")
}
solution_mine <- function(x){
ifelse(df$a %in% 1, "a==1",
ifelse(df$b %in% 1, "b==1",
ifelse(df$c %in% 1, "c==1", NA)))
}
set.seed(1)
df <- data.frame(a = sample(c(1, rep(NA, 4)), 1e6, T),
b = sample(c(1, rep(NA, 4)), 1e6, T),
c = sample(c(1, rep(NA, 4)), 1e6, T))
microbenchmark::microbenchmark(
solution_original(),
solution_akrun(),
solution_mine()
)
## Unit: milliseconds
## expr min lq mean median uq max neval
## solution_original() 701.9413 839.3715 845.0720 853.1960 875.6151 1051.6659 100
## solution_akrun() 217.4129 242.5113 293.2987 253.2144 387.1598 564.3981 100
## solution_mine() 698.7628 845.0822 848.6717 858.7892 877.9676 1006.2872 100
Was inspired by this: R: Dealing with TRUE, FALSE, NA and NaN
Edit
Following the comment by @arkun, I redid the benchmark and revised the statement.
apply with ifelse statement and is.na does not 'sum' but outputs matrix - where is my logical mistake?
Here's a working version:
apply(dat[,2:3], MARGIN=1, function(x)
{
if(all(is.na(x))) {
NA
} else {
sum(x==1, na.rm=TRUE)
}
}
)
#[1] 1 NA 0 2
Issues with yours:
- Inside your
function(x)
,x
is thevar1
andvar2
values for a particular row. You don't want to go back and referencedat$var1
anddat$var2
, which is the whole column! Just usex
. x== is.na(dat$var1) & is.na(dat$var2)
is strange. It's trying to check whetherx
is the same asis.na(dat$var1)
?- For a given row, we want to check whether all the values are
NA
.ifelse
is vectorized and will return a vector - but we don't want a vector, we want a singleTRUE
orFALSE
indicating whether all values areNA
. So we useall(is.na())
. Andif()
instead ofifelse
.
Related Topics
How to Show Corpus Text in R Tm Package
Taking a Disproportionate Sample from a Dataset in R
R: Replace Na with Item from Vector
Reshaping a Data Frame with More Than One Measure Variable
How to Do Gaussian Elimination in R (Do Not Use "Solve")
Rjava Is Not Picking Up the Correct Java Version
R Ggplot Boxplot: Change Y-Axis Limit
Evaluate Inline R Code in Rmarkdown Figure Caption
Clustered Standard Errors in R Using Plm (With Fixed Effects)
Extract Column Name in Mutate_If Call
Setting Hex Bins in Ggplot2 to Same Size
Change Color Actionbutton Shiny R
Sum Multiple Columns by Group with Tapply
How Can a Script Find Itself in R Running from the Command Line
Add Missing Value in Column with Value from Row Above
Set Environment Variables for System() in R