Recode multiple columns using dplyr
Currently, based on dplyr documentation:
across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().
So, using mutate
and across
instead is now recommended.
Like Chris LeBoa said, if you only want to convert an annoying value to NA
, the function na_if()
is probably the best choice:
y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))
y
y1 y2 y3
1 1 1 TRUE
2 2 2 TRUE
3 999 999 FALSE
4 3 3 FALSE
5 4 4 TRUE
z <- y %>%
mutate(across(
y1:y2,
~na_if(., 999)
))
z
y1 y2 y3
1 1 1 TRUE
2 2 2 TRUE
3 NA NA FALSE
4 3 3 FALSE
5 4 4 TRUE
Similarly, if you really want to recode
values in multiple columns, you can follow the example from bcarothers:
df1 <- tibble(Q7_1=1:5,
Q7_1_TEXT=c("let's","see","grogu","this","week"),
Q8_1=6:10,
Q8_1_TEXT=rep("grogu",5),
Q8_2=11:15,
Q8_2_TEXT=c("grogu","is","the","absolute","best"))
df2 <- df1 %>%
mutate(across(
starts_with("Q8") & ends_with("TEXT"),
~recode(., "grogu"="mando")
))
Use recode to mutate across multiple columns using named list of named vectors
Below are three approaches:
First, we can make it work with dplyr::across
in a custom function using dplyr::cur_column()
.
library(tidyverse)
myfun <- function(x) {
mycol <- cur_column()
dplyr::recode(x, !!! dicts[[mycol]])
}
test %>%
mutate(across(c("A", "B", "C"), myfun))
#> # A tibble: 3 x 4
#> Names A B C
#> <chr> <chr> <chr> <chr>
#> 1 Alice charlie yes delta
#> 2 Bob delta no epsilon
#> 3 Cindy bravo bad beta
A second option is to transform the dicts
into a list of expression and then just splice it into mutate
using the !!!
operator:
expr_ls <- imap(dicts, ~ quo(recode(!!sym(.y), !!!.x)))
test %>%
mutate(!!! expr_ls)
#> # A tibble: 3 x 4
#> Names A B C
#> <chr> <chr> <chr> <chr>
#> 1 Alice charlie yes delta
#> 2 Bob delta no epsilon
#> 3 Cindy bravo bad beta
Finally, in the larger tidyverse we could use purrr::lmap_at
, but it makes the underlying function more complex than it needs to be:
myfun2 <- function(x) {
x_nm <- names(x)
mutate(x, !! x_nm := recode(!! sym(x_nm), !!! dicts[[x_nm]]))
}
lmap_at(test,
names(dicts),
myfun2)
#> # A tibble: 3 x 4
#> Names A B C
#> <chr> <chr> <chr> <chr>
#> 1 Alice charlie yes delta
#> 2 Bob delta no epsilon
#> 3 Cindy bravo bad beta
Original data
# Starting tibble
test <- tibble(Names = c("Alice","Bob","Cindy"),
A = c(3,"q",7),
B = c(1,2,"b"),
C = c("a","g",9))
# Named vector
A <- c("5" = "alpha", "7" = "bravo", "3" = "charlie", "q" = "delta")
B <- c("1" = "yes", "2" = "no", "b" = "bad", "c" = "missing")
C <- c("9" = "beta", "8" = "gamma", "a" = "delta", "g" = "epsilon")
# Named list of named vectors
dicts <- list("A" = A, "B" = B, "C" = C) # Same names as columns
Created on 2021-12-15 by the reprex package (v2.0.1)
R - How to recode multiple columns
dplyr
has the na_if()
function for precisely this task. You were almost there with your code and can use:
mutate_at(df, VectorOfNames, ~na_if(.x, 6))
ID Score1 Score2 Score3
1 1 1 2 3
2 2 2 2 2
3 3 3 3 3
4 4 2 NA 4
5 5 5 5 5
6 6 NA NA 5
7 7 NA NA NA
8 8 2 2 2
9 9 5 3 NA
10 10 4 4 4
How to recode a range of values into a new column
Figured it out! Use case_when and between
data %>%
mutate(diab_bin = case_when(
diab==1 ~ 0,
between(diab, 2,5) ~ 1
))
R: How to recode multiple variables at once
This is neater I think with dplyr. Using recode
correctly is a good idea. mutate_all()
can be used to operate on the whole dataframe, mutate_at()
on just selected variables. There are lots of ways to specify variables in dplyr.
mydata <- data.frame(arg1=c(1,2,4,5),arg2=c(1,1,2,0),arg3=c(1,1,1,1))
mydata
arg1 arg2 arg3
1 1 1 1
2 2 1 1
3 4 2 1
4 5 0 1
mydata <- mydata %>%
mutate_at(c("arg1","arg2"), funs(recode(., `1`=-1, `2`=1, .default = NaN)))
mydata
arg1 arg2 arg3
1 -1 -1 1
2 1 -1 1
3 NaN 1 1
4 NaN NaN 1
I use NaN instead of NA as it is numeric is be simpler to manage within a column of other numbers.
How do I recode multiple variables from string to numeric?
The following method seems to have worked for my issue (recoding string variables to numeric in multiple columns):
For_Analysis <- data.frame(Q11_1=c("Never", "Often", "Sometimes"),
Q11_2=c("Sometimes", "Often", "Never"), Q11_3=c("Never", "Never", "Often"))
New_Values <- c(1, 2, 3, 4, 5)
Old_Values <- unique(For_Analysis$Q11_1)
For_Analysis[1:3] <- as.data.frame(sapply(For_Analysis[1:3],
mapvalues, from = Old_Values, to = New_Values))
Thanks for the help!
What is the shortest and cleanest way to recode multiple variables in a dataframe using R?
I think if used correctly, dplyr
has the "cleanest" syntax in this case:
library(dplyr)
tib <- tibble(v1 = 1:4,
v2 = 1:4,
v3 = sample(1:5, 4, replace = FALSE))
tib %>%
mutate_at(vars(v1:v3), recode, `1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1)
#> # A tibble: 4 x 3
#> v1 v2 v3
#> <dbl> <dbl> <dbl>
#> 1 5 5 2
#> 2 4 4 5
#> 3 3 3 4
#> 4 2 2 1
Note that I had to add 3 = 3
because recode needs a replacement for all values.
I often find it easier to write things more explicitly with functions that are new to me, so maybe this might help:
tib %>%
mutate_at(.vars = vars(v1:v3),
.funs = function(x) recode(x,
`1` = 5,
`2` = 4,
`3` = 3,
`4` = 2,
`5` = 1))
If you prefer the recode
function from car
you should not load car
but use:
tib %>%
mutate_at(vars(v1:v3), car::recode, "1=5; 2=4; 4=2; 5=1")
That way you don't run into trouble mixing dplyr
with car
(as long as you don't need car
for anything else.
Selecting large number of columns for recoding the variable values
library(dplyr)
df %>%
mutate(across(
EDU1:EDU150,
~ recode(
.x,
`0` = 91L,
`1` = 92L,
`2` = 93L,
`3` = 94L,
`4` = 94L,
`5` = 94L
)
))
Related Topics
Code Folding for Individual Chunks in R Markdown
R Cmd Check Not Looking for Gcc in Rtools Directory
Check Which Elements of a Vector Is Between the Elements of Another One in R
How to Do a Glm When "Contrasts Can Be Applied Only to Factors with 2 or More Levels"
Warnings When Running an Lmer in R
Terms of a Sum in a R Expression
Calculate Row Means Based on (Partial) Matching Column Names
Adding Grouped Mean Values to Column in Data Frame
Display Selected Folder Path in Shiny
Cumulative Number of Unique Values in a Column Up to Current Row
Finding If Boolean Is Ever True by Groups in R
Transform One Column from Categoric to Binary, Keep the Rest
R: How to Create Grid-Graphics
Error in As.Double(Y):Cannot Coerce Type 'S4' to Vector of Type 'Double'