Avoiding Type Conflicts with Dplyr::Case_When

Avoiding type conflicts with dplyr::case_when

As said in ?case_when:

All RHSs must evaluate to the same type of vector.

You actually have two possibilities:

1) Create new as a numeric vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))

Note that NA_real_ is the numeric version of NA, and that you must convert old to numeric because you created it as an integer in your original dataframe.

You get:

str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3

2) Create new as an integer vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))

Here, 5L forces 5 into the integer type, and NA_integer_ is the integer version of NA.

So this time new is integer:

str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3

Trouble using case_when in dplyr

The error is because case_when expects all RHS to evaluate to the same type.

Here, in OP's attempt TRUE is of class "logical" and x is of type "integer" hence, it returns the error. You could try,

x <- 1:5
dplyr::case_when(x == 1 ~ NA_integer_, x != 1 ~ x)
#[1] NA 2 3 4 5

Or another way :

dplyr::case_when(x != 1 ~ x, TRUE ~ NA_integer_)

Type conflict setting multiple variables to NA with mutate, across, case_when

Another option would be to use an if statement:

library(dplyr)

mtcars$carb <- as.integer(mtcars$carb)

mtcars <- mtcars %>%
mutate(across(c(gear:carb), ~ case_when(
vs == 1 ~ if (is.integer(.)) NA_integer_ else NA_real_,
T ~ .
)))

But the much more clever approach I learned thanks to the comment by @r2evans would be use .[NA] which "will always give the appropriate NA type":

mtcars <- mtcars %>%
mutate(across(c(gear:carb), ~ case_when(
vs == 1 ~ .[NA],
T ~ .
)))

head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 NA NA
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 NA NA
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 NA NA

Issue with case_when statement using & in dplyr?

The column percentile is factor. We need to convert to character class first and then to numeric

library(dplyr)
df1 %>%
mutate(percentile = as.numeric(as.character(percentile))) %>%
...

What happens is that when we directly coerce to numeric/integer, it gets coerced to integer storage values instead of the actual values

v1 <- factor(c(81.9, 82.7, 81.9, 82.5))
as.numeric(v1)
#[1] 1 3 1 2

is different than the following

as.numeric(as.character(v1))
#[1] 81.9 82.7 81.9 82.5

Or probably faster with levels

as.numeric(levels(v1)[v1])
#[1] 81.9 82.7 81.9 82.5

dplyr::case_when() inexplicably returns names(message) - `*vtmp*` error

You get the problem because you are trying to mix a logical and a numeric vector.

In your case_when statement:

case_when(
is.na(tmode) ~ NA,
durationI > 180 ~ 180,
TRUE ~ durationI
)

Your first case evaluates to NA. This makes R think that you want a logical vector. When the next row is evaluating to a numeric, you get the error.

You can fix this by replacing NA with a missing value of type numeric NA_real_:

raw %>% 
mutate(
distanceI = ifelse(is.na(tmode), NA, distanceI),
durationI = case_when(
is.na(tmode) ~ NA_real_,
durationI > 180 ~ 180,
TRUE ~ durationI
)
)
#> # A tibble: 3 × 4
#> activity_ID durationI distanceI tmode
#> <dbl> <dbl> <dbl> <chr>
#> 1 1 180 57 auto
#> 2 2 NA NA <NA>
#> 3 3 91 58 rail

How to get case_when in dplyr accept conditions from character

How about with parse_exprs?

library(dplyr)
library(rlang)
cond <- "Age > 40 ~ 1, TRUE ~ 0"
cond <- gsub(",",";",cond)
repdata %>% mutate(result = case_when(!!!rlang::parse_exprs(cond)))
## A tibble: 10 x 2
# Age result
# <dbl> <dbl>
# 1 23 0
# 2 26 0
# 3 32 0
# 4 50 1
# 5 51 1
# 6 52 1
# 7 25 0
# 8 49 1
# 9 34 0
#10 54 1

This is required because parse_expr returns one expression, whereas case_when requires 2 or more expressions (separated by commas in code) to have 2 cases. Meanwhile, parse_exprs returns 2 or more expressions, but it splits expressions on ;.

Data

repdata <- tibble::tribble(~Age,23,26,32,50,51,52,25,49,34,54)

How to use case_when() on a list with apply or map

Instead of case_when, a more easier option is a join after converting the named list to a two column tibble with (tibble::enframe)

library(dplyr)
library(tidyr)
library(tibble)
enframe(annotation, name = 'type', value = 'markers') %>%
unnest(markers) %>%
right_join(tibble(markers = colnames(df))) %>%
relocate(type, .after = 'markers')

-output

# A tibble: 4 × 2
markers type
<chr> <chr>
1 L marker_1
2 D marker_1
3 C marker_2
4 R marker_2

Or another opition is to loop over the list, get the intersecting elements and convert the named list to tibble

library(purrr)
map(annotation, ~ intersect(names(df), .x)) %>%
keep(lengths(.) > 0) %>%
enframe(name = 'markers', value = 'type') %>%
unnest(type)

Or using base R with lapply and stack

lapply(annotation, \(x) intersect(names(df), x)) |>
Filter(length, x = _) |>
stack() |>
setNames(c("markers", "type")) |>
subset(select = 2:1)

-output

      type markers
1 marker_1 L
2 marker_1 D
3 marker_2 C
4 marker_2 R

dplyr `case_when()` trouble with NA

Try the following code - it tells case_when that you are expecting the NA to be a character, like the rest of your column. I think you are also missing a bracket above.

df %>% 
mutate(col3 = case_when(ID == "ABC" & Date == as.Date("2019-01-03") ~ "fizz",
ID == "EFG" & Date == as.Date("2019-01-08") ~ "buzz",
TRUE ~ as.character(NA)))

# A tibble: 6 x 3
ID Date col3
<chr> <date> <chr>
1 ABC 2019-01-03 fizz
2 EFG 2019-01-08 buzz
3 HIJ 2019-06-09 NA
4 KLM 2019-06-11 NA
5 NOP 2019-08-12 NA
6 QRS 2019-08-21 NA


Related Topics



Leave a reply



Submit