Recode Multiple Columns Using Dplyr

Recode multiple columns using dplyr

Currently, based on dplyr documentation:

across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

So, using mutate and across instead is now recommended.

Like Chris LeBoa said, if you only want to convert an annoying value to NA, the function na_if() is probably the best choice:

y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))

y
   y1  y2    y3
1   1   1  TRUE
2   2   2  TRUE
3 999 999 FALSE
4   3   3 FALSE
5   4   4  TRUE
 
z <- y %>%
    mutate(across(
        y1:y2,
        ~na_if(., 999)
    ))

z
  y1 y2    y3
1  1  1  TRUE
2  2  2  TRUE
3 NA NA FALSE
4  3  3 FALSE
5  4  4  TRUE

Similarly, if you really want to recode values in multiple columns, you can follow the example from bcarothers:

df1 <- tibble(Q7_1=1:5,
              Q7_1_TEXT=c("let's","see","grogu","this","week"),
              Q8_1=6:10,
              Q8_1_TEXT=rep("grogu",5),
              Q8_2=11:15,
              Q8_2_TEXT=c("grogu","is","the","absolute","best"))

df2 <- df1 %>%
    mutate(across(
        starts_with("Q8") & ends_with("TEXT"),
        ~recode(., "grogu"="mando")
    ))

Use recode to mutate across multiple columns using named list of named vectors

Below are three approaches:

First, we can make it work with dplyr::across in a custom function using dplyr::cur_column().

library(tidyverse)

myfun <- function(x) {
  mycol <- cur_column()
  dplyr::recode(x, !!! dicts[[mycol]])
}

test %>% 
  mutate(across(c("A", "B", "C"), myfun))

#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta

A second option is to transform the dicts into a list of expression and then just splice it into mutate using the !!! operator:

expr_ls <-  imap(dicts, ~ quo(recode(!!sym(.y), !!!.x)))

test %>% 
  mutate(!!! expr_ls)

#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta

Finally, in the larger tidyverse we could use purrr::lmap_at, but it makes the underlying function more complex than it needs to be:

myfun2 <- function(x) {
  x_nm <- names(x)
  mutate(x, !! x_nm := recode(!! sym(x_nm), !!! dicts[[x_nm]]))
}

lmap_at(test, 
        names(dicts),
        myfun2)
#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta

Original data

# Starting tibble
test <- tibble(Names = c("Alice","Bob","Cindy"),
               A = c(3,"q",7),
               B = c(1,2,"b"),
               C = c("a","g",9))

# Named vector
A <- c("5" = "alpha", "7" = "bravo", "3" = "charlie", "q" = "delta")
B <- c("1" = "yes", "2" = "no", "b" = "bad", "c" = "missing")
C <- c("9" = "beta", "8" = "gamma", "a" = "delta", "g" = "epsilon")

# Named list of named vectors
dicts <- list("A" = A, "B" = B, "C" = C) # Same names as columns

^{Created on 2021-12-15 by the reprex package (v2.0.1)}

R - How to recode multiple columns

dplyr has the na_if() function for precisely this task. You were almost there with your code and can use:

mutate_at(df, VectorOfNames, ~na_if(.x, 6))

   ID Score1 Score2 Score3
1   1      1      2      3
2   2      2      2      2
3   3      3      3      3
4   4      2     NA      4
5   5      5      5      5
6   6     NA     NA      5
7   7     NA     NA     NA
8   8      2      2      2
9   9      5      3     NA
10 10      4      4      4

How to recode a range of values into a new column

Figured it out! Use case_when and between

data %>%
  mutate(diab_bin = case_when(
    diab==1 ~ 0,
    between(diab, 2,5) ~ 1
  ))

R: How to recode multiple variables at once

This is neater I think with dplyr. Using recode correctly is a good idea. mutate_all() can be used to operate on the whole dataframe, mutate_at() on just selected variables. There are lots of ways to specify variables in dplyr.

mydata <- data.frame(arg1=c(1,2,4,5),arg2=c(1,1,2,0),arg3=c(1,1,1,1))

mydata

  arg1 arg2 arg3
1    1    1    1
2    2    1    1
3    4    2    1
4    5    0    1

mydata <- mydata %>% 
     mutate_at(c("arg1","arg2"), funs(recode(., `1`=-1, `2`=1, .default = NaN)))

mydata

  arg1 arg2 arg3
1   -1   -1    1
2    1   -1    1
3  NaN    1    1
4  NaN  NaN    1

I use NaN instead of NA as it is numeric is be simpler to manage within a column of other numbers.

How do I recode multiple variables from string to numeric?

The following method seems to have worked for my issue (recoding string variables to numeric in multiple columns):

For_Analysis <- data.frame(Q11_1=c("Never", "Often", "Sometimes"),
 Q11_2=c("Sometimes", "Often", "Never"), Q11_3=c("Never", "Never", "Often"))

New_Values <- c(1, 2, 3, 4, 5)
Old_Values <- unique(For_Analysis$Q11_1)

For_Analysis[1:3] <- as.data.frame(sapply(For_Analysis[1:3],
                     mapvalues, from = Old_Values, to = New_Values))

Thanks for the help!

What is the shortest and cleanest way to recode multiple variables in a dataframe using R?

I think if used correctly, dplyr has the "cleanest" syntax in this case:

library(dplyr)
tib <- tibble(v1 = 1:4, 
              v2 = 1:4,
              v3 = sample(1:5, 4, replace = FALSE))

tib %>% 
  mutate_at(vars(v1:v3), recode, `1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1)
#> # A tibble: 4 x 3
#>      v1    v2    v3
#>   <dbl> <dbl> <dbl>
#> 1     5     5     2
#> 2     4     4     5
#> 3     3     3     4
#> 4     2     2     1

Note that I had to add 3 = 3 because recode needs a replacement for all values.

I often find it easier to write things more explicitly with functions that are new to me, so maybe this might help:

tib %>% 
  mutate_at(.vars = vars(v1:v3), 
            .funs = function(x) recode(x, 
                                       `1` = 5, 
                                       `2` = 4, 
                                       `3` = 3, 
                                       `4` = 2, 
                                       `5` = 1))

If you prefer the recode function from car you should not load car but use:

tib %>% 
  mutate_at(vars(v1:v3), car::recode, "1=5; 2=4; 4=2; 5=1")

That way you don't run into trouble mixing dplyr with car (as long as you don't need car for anything else.

Selecting large number of columns for recoding the variable values

library(dplyr)
df %>%
  mutate(across(
    EDU1:EDU150,
    ~ recode(
      .x,
      `0` = 91L,
      `1` = 92L,
      `2` = 93L,
      `3` = 94L,
      `4` = 94L,
      `5` = 94L
    )
  ))

Recode Multiple Columns Using Dplyr