if_else() `false` must be type double, not integer - in R
if_else
from dplyr
is type-stable, meaning that it checks whether the "true" and "false" conditions are the same type. If they aren't, if_else
throws an error. ifelse
in Base R does not do that.
When writing:
mutate(n = if_else(FiscalYear == "FY2018" & Candy == "SNICKERS", n - 3, n))
I assume n
was originally an integer type, so "false" would be of integer type, n-3
coerces "true" to a double, because 3
is double. "true" and "false" are of different types, so if_else
throws an error.
When writing:
mutate(qty = if_else(name == "Bob" & fruit == "apple", qty / 2, qty))
qty
is likely already a double, so dividing a double by 2
(a double) still yields a double. "true" and "false" are the same type. Hence no error.
With that being said, this can easily be checked with the following typeof
s:
> typeof(6)
[1] "double"
> typeof(6L)
[1] "integer"
> typeof(6L-3)
[1] "double"
> typeof(6L-3L)
[1] "integer"
> typeof(6/2)
[1] "double"
ifelse
from Base R does implicit coercing, which converts everything to the same type. This means that it doesn't throw an error when "true" and "false" are of different types. This is both more convenient and dangerous as there might be unexpected results after implicit coercing.
I recommend using ifelse
for one-off/adhoc programs, and if_else
for when you want to take advantage of the built-in unit test.
Convert integer to numeric/double for dplyr::if_else()
But you didn't convert the data frame to numeric, at least not in any of the code you provided. I'll do it for you:
# Read in sample data
chromSizes <- read.table(header = TRUE, text = '
Length
chrIV 1531933
chrXV 1091291
chrVII 1090940
chrXII 1078177
chrXVI 948066
chrXIII 924431
chrII 813184
chrXIV 784333
chrX 745751
chrXI 666816
chrV 576874
chrVIII 562643
chrIX 439888
chrIII 316620
chrVI 270161
chrI 230218
chrM 85779')
# Convert to numeric
chromSizes$Length <- as.numeric(chromSizes$Length)
# Check and see that it is numeric
is.numeric(chromSizes$Length)
# [1] TRUE
Then the if_else()
should work:
library(dplyr)
# Sample data
df.b <- read.table(header = TRUE, text = '
Chromosome_Strand Chromosome
x chrIV
y chrXV
z chrVII
- chrXII
a chrXVI
b chrXIII
c chrII
- chrXIV')
# Run if_else condition with numeric 0 FALSE condition
leading4 <- if_else(df.b$Chromosome_Strand == "-", chromSizes[df.b$Chromosome,], 0)
# View results
leading4
# [1] 0 0 0 1078177 0 0 0 784333
Using if_else, I can't return the column used as the conditional if the conditional is false
You seem to be mixing dplyr
and base
R frameworks.
In base
R, you would use
mydata$new.vary <- ifelse(mydata$color == 'E', mydata$position, mydata$color)
This works, but you should take note that mydata$position
is a character object, and mydata$color
is a factor, which is represented internally as an integer. If you try to run the same code with if_else
from the dplyr
package, you'll get the following:
mydata$new.vary <- if_else(mydata$color == 'E', mydata$position, mydata$color)
Error: `false` must be type character, not integer
if_else
is a little more strict than ifelse
, requiring that both the true
and false
arguments have the same type.
If you want to use the dplyr
approach, you can use
mydata %>%
mutate(new.vary = ifelse(color == 'E', position, color))
or, if you want to use dplyr
's if_else
, try
mydata %>%
mutate(color = as.character(color),
new.vary = if_else(color == 'E', position, color))
How to mutate some values of a dataframe based on values from another dataframe column with R
The if_else
does type checks. According to ?if_else
Compared to the base ifelse(), this function is more strict. It checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.
and NA
by default returns NA_logical_
.
typeof(NA)
#[1] "logical"
According to ?NA
NA is a logical constant of length 1 which contains a missing value indicator. NA can be coerced to any other vector type except raw. There are also constants NA_integer_, NA_real_, NA_complex_ and NA_character_ of the other atomic vector types which support missing values: all of these are reserved words in the R language.
We need NA_character_
specifically as there is no coercing to appropriate type (which would normally work with base R
ifelse
)
typeof(NA_character_)
#[1] "character"
Therefore, it is better to use the appropriate type matched NA
library(dplyr)
df1 %>%
mutate(x = if_else(str_sub(x,3,4) %in% df2$x &
year == 2020, NA_character_, x))
The ifelse
doesn't have that issue as the NA
automatically is converted to NA_character_
df1 %>%
mutate(x = ifelse(str_sub(x,3,4) %in% df2$x & year == 2020, NA, x))
if_else does not return NA as expected (returns false condition instead)
The operator %in% returns false against the NA value:
test_vector %in% c("1dose", "2dose", "yes")
[1] TRUE TRUE TRUE FALSE FALSE FALSE
I believe str_detect is going to give you the behavior you're looking for:
> if_else(str_detect(test_vector, c("1dose", "2dose", "yes")),"yes","no")
[1] "yes" "yes" "yes" "no" "no" NA
dplyr::if_else - check for condition and insert NA as part of the evaluation
you can coerce the NA
into date too, ie:
df %>% mutate(sus_date = if_else(status == "Suspended", date, ymd(NA)))
date status sus_date
1 2019-01-01 Active <NA>
2 2019-01-02 Suspended 2019-01-02
3 2019-01-03 Active <NA>
Specify class of NA in R (for if_else, dplyr)
you can use NA_real_
if_else(mtcars$cyl > 5, NA_real_, 1)
Avoiding type conflicts with dplyr::case_when
As said in ?case_when
:
All RHSs must evaluate to the same type of vector.
You actually have two possibilities:
1) Create new
as a numeric vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))
Note that NA_real_
is the numeric version of NA
, and that you must convert old
to numeric because you created it as an integer in your original dataframe.
You get:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3
2) Create new
as an integer vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))
Here, 5L
forces 5 into the integer type, and NA_integer_
is the integer version of NA
.
So this time new
is integer:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3
dplyr if_else() vs base R ifelse()
if_else
is more strict. It checks that both alternatives are of the same type and otherwise throws an error, while ifelse
will promote types as necessary. This may be a benefit in some circumstances, but may otherwise break scripts if you don't check for errors or explicitly force type conversion. For example:
ifelse(c(TRUE,TRUE,FALSE),"a",3)
[1] "a" "a" "3"
if_else(c(TRUE,TRUE,FALSE),"a",3)
Error: `false` must be type character, not double
Related Topics
How to Summarizing Data Statistics Using R
Setting Ld_Library_Path from Inside R
Collapse and Merge Overlapping Time Intervals
Use Dygraph for R to Plot Xts Time Series by Year Only
Can .Sd Be Viewed from a Browser Within [.Data.Table()
How to Store R Ggplot Graph as HTML Code Snippet
How to Control Ggplot's Plotting Area Proportions Instead of Fitting Them to Devices in R
Geom_Rect Failure: Error in Eval(Expr, Envir, Enclos):Object 'Variable' Not Found
Hover Image in Plotly R Chart in Shiny App
Shade (Fill or Color) Area Under Density Curve by Quantile
Set Environment Variables for System() in R
Shiny + Ggplot: How to Subset Reactive Data Object
How to Pass Pandoc_Args to Yaml Header in Rmarkdown
Using Mean with .Sd and .Sdcols in Data.Table
R: Merge Based on Multiple Conditions (With Non-Equal Criteria)