Avoiding Type Conflicts with Dplyr::Case_When

Avoiding type conflicts with dplyr::case_when

As said in ?case_when:

All RHSs must evaluate to the same type of vector.

You actually have two possibilities:

1) Create new as a numeric vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5,
                                    old == 2 ~ NA_real_,
                                    TRUE ~ as.numeric(old)))

Note that NA_real_ is the numeric version of NA, and that you must convert old to numeric because you created it as an integer in your original dataframe.

You get:

str(df)
# 'data.frame': 3 obs. of  2 variables:
# $ old: int  1 2 3
# $ new: num  5 NA 3

2) Create new as an integer vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
                                    old == 2 ~ NA_integer_,
                                    TRUE ~ old))

Here, 5L forces 5 into the integer type, and NA_integer_ is the integer version of NA.

So this time new is integer:

str(df)
# 'data.frame': 3 obs. of  2 variables:
# $ old: int  1 2 3
# $ new: int  5 NA 3

Trouble using case_when in dplyr

The error is because case_when expects all RHS to evaluate to the same type.

Here, in OP's attempt TRUE is of class "logical" and x is of type "integer" hence, it returns the error. You could try,

x <- 1:5
dplyr::case_when(x == 1 ~ NA_integer_, x != 1 ~ x)
#[1] NA  2  3  4  5

Or another way :

dplyr::case_when(x != 1 ~ x, TRUE ~ NA_integer_)

Type conflict setting multiple variables to NA with mutate, across, case_when

Another option would be to use an if statement:

library(dplyr)

mtcars$carb <- as.integer(mtcars$carb)

mtcars <- mtcars %>%
  mutate(across(c(gear:carb), ~ case_when(
    vs == 1 ~ if (is.integer(.)) NA_integer_ else NA_real_,
    T ~ .
  )))

But the much more clever approach I learned thanks to the comment by @r2evans would be use .[NA] which "will always give the appropriate NA type":

mtcars <- mtcars %>%
  mutate(across(c(gear:carb), ~ case_when(
    vs == 1 ~ .[NA],
    T ~ .
  )))

head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1   NA   NA
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0   NA   NA
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0   NA   NA

Issue with case_when statement using & in dplyr?

The column percentile is factor. We need to convert to character class first and then to numeric

library(dplyr)
 df1 %>%
     mutate(percentile = as.numeric(as.character(percentile))) %>%
     ...

What happens is that when we directly coerce to numeric/integer, it gets coerced to integer storage values instead of the actual values

v1 <- factor(c(81.9, 82.7, 81.9, 82.5))
as.numeric(v1)
#[1] 1 3 1 2

is different than the following

as.numeric(as.character(v1))
#[1] 81.9 82.7 81.9 82.5

Or probably faster with levels

as.numeric(levels(v1)[v1])
#[1] 81.9 82.7 81.9 82.5

dplyr::case_when() inexplicably returns names(message) - `vtmp` error

You get the problem because you are trying to mix a logical and a numeric vector.

In your case_when statement:

case_when(
  is.na(tmode) ~ NA,
  durationI > 180 ~ 180,
  TRUE ~ durationI
)

Your first case evaluates to NA. This makes R think that you want a logical vector. When the next row is evaluating to a numeric, you get the error.

You can fix this by replacing NA with a missing value of type numeric NA_real_:

raw %>% 
  mutate(
    distanceI = ifelse(is.na(tmode), NA, distanceI),
    durationI = case_when(
      is.na(tmode) ~ NA_real_,
      durationI > 180 ~ 180,
      TRUE ~ durationI
    )
  )
#> # A tibble: 3 × 4
#>   activity_ID durationI distanceI tmode
#>         <dbl>     <dbl>     <dbl> <chr>
#> 1           1       180        57 auto 
#> 2           2        NA        NA <NA> 
#> 3           3        91        58 rail

How to get case_when in dplyr accept conditions from character

How about with parse_exprs?

library(dplyr)
library(rlang)
cond <- "Age > 40 ~ 1, TRUE ~ 0"
cond <- gsub(",",";",cond)
repdata %>% mutate(result = case_when(!!!rlang::parse_exprs(cond)))
## A tibble: 10 x 2
#     Age result
#   <dbl>  <dbl>
# 1    23      0
# 2    26      0
# 3    32      0
# 4    50      1
# 5    51      1
# 6    52      1
# 7    25      0
# 8    49      1
# 9    34      0
#10    54      1

This is required because parse_expr returns one expression, whereas case_when requires 2 or more expressions (separated by commas in code) to have 2 cases. Meanwhile, parse_exprs returns 2 or more expressions, but it splits expressions on ;.

Data

repdata <- tibble::tribble(~Age,23,26,32,50,51,52,25,49,34,54)

How to use case_when() on a list with apply or map

Instead of case_when, a more easier option is a join after converting the named list to a two column tibble with (tibble::enframe)

library(dplyr)
library(tidyr)
library(tibble)
enframe(annotation, name = 'type', value = 'markers') %>% 
   unnest(markers) %>%
    right_join(tibble(markers = colnames(df))) %>%
   relocate(type, .after = 'markers')

-output

# A tibble: 4 × 2
  markers type    
  <chr>   <chr>   
1 L       marker_1
2 D       marker_1
3 C       marker_2
4 R       marker_2

Or another opition is to loop over the list, get the intersecting elements and convert the named list to tibble

library(purrr)
map(annotation, ~ intersect(names(df), .x)) %>%
  keep(lengths(.) > 0) %>%
  enframe(name = 'markers', value = 'type') %>%
  unnest(type)

Or using base R with lapply and stack

lapply(annotation, \(x) intersect(names(df), x)) |>
    Filter(length, x = _) |>
    stack() |> 
    setNames(c("markers", "type")) |>
    subset(select = 2:1)

-output

      type markers
1 marker_1       L
2 marker_1       D
3 marker_2       C
4 marker_2       R

dplyr `case_when()` trouble with NA

Try the following code - it tells case_when that you are expecting the NA to be a character, like the rest of your column. I think you are also missing a bracket above.

df %>% 
  mutate(col3 = case_when(ID == "ABC" & Date == as.Date("2019-01-03") ~ "fizz",
                          ID == "EFG" & Date == as.Date("2019-01-08") ~ "buzz",
                          TRUE ~ as.character(NA)))

# A tibble: 6 x 3
  ID    Date       col3 
  <chr> <date>     <chr>
1 ABC   2019-01-03 fizz 
2 EFG   2019-01-08 buzz 
3 HIJ   2019-06-09 NA   
4 KLM   2019-06-11 NA   
5 NOP   2019-08-12 NA   
6 QRS   2019-08-21 NA

Avoiding Type Conflicts with Dplyr::Case_When