Avoiding type conflicts with dplyr::case_when
As said in ?case_when
:
All RHSs must evaluate to the same type of vector.
You actually have two possibilities:
1) Create new
as a numeric vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))
Note that NA_real_
is the numeric version of NA
, and that you must convert old
to numeric because you created it as an integer in your original dataframe.
You get:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3
2) Create new
as an integer vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))
Here, 5L
forces 5 into the integer type, and NA_integer_
is the integer version of NA
.
So this time new
is integer:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3
Trouble using case_when in dplyr
The error is because case_when
expects all RHS
to evaluate to the same type.
Here, in OP's attempt TRUE
is of class "logical"
and x
is of type "integer"
hence, it returns the error. You could try,
x <- 1:5
dplyr::case_when(x == 1 ~ NA_integer_, x != 1 ~ x)
#[1] NA 2 3 4 5
Or another way :
dplyr::case_when(x != 1 ~ x, TRUE ~ NA_integer_)
Type conflict setting multiple variables to NA with mutate, across, case_when
Another option would be to use an if
statement:
library(dplyr)
mtcars$carb <- as.integer(mtcars$carb)
mtcars <- mtcars %>%
mutate(across(c(gear:carb), ~ case_when(
vs == 1 ~ if (is.integer(.)) NA_integer_ else NA_real_,
T ~ .
)))
But the much more clever approach I learned thanks to the comment by @r2evans would be use .[NA]
which "will always give the appropriate NA
type":
mtcars <- mtcars %>%
mutate(across(c(gear:carb), ~ case_when(
vs == 1 ~ .[NA],
T ~ .
)))
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 NA NA
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 NA NA
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 NA NA
Issue with case_when statement using & in dplyr?
The column percentile
is factor
. We need to convert to character
class first and then to numeric
library(dplyr)
df1 %>%
mutate(percentile = as.numeric(as.character(percentile))) %>%
...
What happens is that when we directly coerce to numeric/integer, it gets coerced to integer storage values instead of the actual values
v1 <- factor(c(81.9, 82.7, 81.9, 82.5))
as.numeric(v1)
#[1] 1 3 1 2
is different than the following
as.numeric(as.character(v1))
#[1] 81.9 82.7 81.9 82.5
Or probably faster with levels
as.numeric(levels(v1)[v1])
#[1] 81.9 82.7 81.9 82.5
dplyr::case_when() inexplicably returns names(message) - `*vtmp*` error
You get the problem because you are trying to mix a logical and a numeric vector.
In your case_when
statement:
case_when(
is.na(tmode) ~ NA,
durationI > 180 ~ 180,
TRUE ~ durationI
)
Your first case evaluates to NA
. This makes R think that you want a logical vector. When the next row is evaluating to a numeric, you get the error.
You can fix this by replacing NA
with a missing value of type numeric NA_real_
:
raw %>%
mutate(
distanceI = ifelse(is.na(tmode), NA, distanceI),
durationI = case_when(
is.na(tmode) ~ NA_real_,
durationI > 180 ~ 180,
TRUE ~ durationI
)
)
#> # A tibble: 3 × 4
#> activity_ID durationI distanceI tmode
#> <dbl> <dbl> <dbl> <chr>
#> 1 1 180 57 auto
#> 2 2 NA NA <NA>
#> 3 3 91 58 rail
How to get case_when in dplyr accept conditions from character
How about with parse_exprs
?
library(dplyr)
library(rlang)
cond <- "Age > 40 ~ 1, TRUE ~ 0"
cond <- gsub(",",";",cond)
repdata %>% mutate(result = case_when(!!!rlang::parse_exprs(cond)))
## A tibble: 10 x 2
# Age result
# <dbl> <dbl>
# 1 23 0
# 2 26 0
# 3 32 0
# 4 50 1
# 5 51 1
# 6 52 1
# 7 25 0
# 8 49 1
# 9 34 0
#10 54 1
This is required because parse_expr
returns one expression, whereas case_when
requires 2 or more expressions (separated by commas in code) to have 2 cases. Meanwhile, parse_exprs
returns 2 or more expressions, but it splits expressions on ;
.
Data
repdata <- tibble::tribble(~Age,23,26,32,50,51,52,25,49,34,54)
How to use case_when() on a list with apply or map
Instead of case_when
, a more easier option is a join after converting the named list
to a two column tibble
with (tibble::enframe
)
library(dplyr)
library(tidyr)
library(tibble)
enframe(annotation, name = 'type', value = 'markers') %>%
unnest(markers) %>%
right_join(tibble(markers = colnames(df))) %>%
relocate(type, .after = 'markers')
-output
# A tibble: 4 × 2
markers type
<chr> <chr>
1 L marker_1
2 D marker_1
3 C marker_2
4 R marker_2
Or another opition is to loop over the list
, get the intersect
ing elements and convert the named list to tibble
library(purrr)
map(annotation, ~ intersect(names(df), .x)) %>%
keep(lengths(.) > 0) %>%
enframe(name = 'markers', value = 'type') %>%
unnest(type)
Or using base R
with lapply
and stack
lapply(annotation, \(x) intersect(names(df), x)) |>
Filter(length, x = _) |>
stack() |>
setNames(c("markers", "type")) |>
subset(select = 2:1)
-output
type markers
1 marker_1 L
2 marker_1 D
3 marker_2 C
4 marker_2 R
dplyr `case_when()` trouble with NA
Try the following code - it tells case_when
that you are expecting the NA
to be a character, like the rest of your column. I think you are also missing a bracket above.
df %>%
mutate(col3 = case_when(ID == "ABC" & Date == as.Date("2019-01-03") ~ "fizz",
ID == "EFG" & Date == as.Date("2019-01-08") ~ "buzz",
TRUE ~ as.character(NA)))
# A tibble: 6 x 3
ID Date col3
<chr> <date> <chr>
1 ABC 2019-01-03 fizz
2 EFG 2019-01-08 buzz
3 HIJ 2019-06-09 NA
4 KLM 2019-06-11 NA
5 NOP 2019-08-12 NA
6 QRS 2019-08-21 NA
Related Topics
How to Count How Many Values Per Level in a Given Factor
How to Remove Duplicated Column Names in R
R 3.4.1 "Single Candle" Personal Library Path Error: Unable to Create 'Na'
Colorize Clusters in Dendogram with Ggplot2
How to Use Earlier Declared Variables Within Aes in Ggplot with Special Operators (..Count.., etc.)
Package 'Stringi' Does Not Work After Updating to R3.2.1
How to Use a MACro Variable in R? (Similar to %Let in Sas)
R: Reorder Facet_Wrapped X-Axis with Free_X in Ggplot2
Dodging Points and Error Bars with Ggplot
Replace Na with 0 in a Data Frame Column
Plotting a Curve Around a Set of Points
How to Change Font Size of the Correlation Coefficient in Corrplot
Understanding Lexical Scoping in R