Apply a Function to Several Columns at Once with Mutate

apply a function to several columns at once with mutate

I think it was nearly there. I've used mutate_at (I think mutate_each is deprecated) and included the variable names inside vars:


library(dplyr)
x %>% mutate_at(vars(ADR:star_rating), funs(stringr::str_replace_all(., ",", "")))
#> X. ADR hotel_id city_id star_rating accommodation_type_name
#> 1 4,084 1099.69 2313076 9395 5 Hotel
#> 2 4,084 68.66 583666 17193 2 Bungalow
#> 3 4,084 232.72 1251372 5085 3 Hotel
#> 4 4,084 195.66 1545890 16808 4 Hotel
#> 5 4,084 98 298160 8584 4 Hotel
#> chain_hotel booking_date checkin_date checkout_date city
#> 1 chain 10/5/2016 10/27/2016 10/30/2016 A
#> 2 non-chain 12/4/2016 12/9/2016 12/12/2016 B
#> 3 non-chain 11/6/2016 11/18/2016 11/20/2016 C
#> 4 non-chain 10/22/2016 11/3/2016 11/4/2016 D
#> 5 non-chain 12/11/2016 12/11/2016 12/12/2016 E

How to mutate multiple columns as function of multiple columns systematically?

You can do something like this with mutate(across..., however, for renaming columns there must be a shortcut.

df %>% 
mutate(across(.cols = c(varA.t1, varA.t2),
.fns = ~ .x / get(glue::glue(str_replace(cur_column(), "varA", "varB"))),
.names = "V_{.col}")) %>%
rename_with(~str_replace(., "V_varA", "varC"), starts_with("V_"))

# A tibble: 2 x 7
id varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 row_1 5 10 2 4 2.5 2.5
2 row_2 20 50 4 6 5 8.33

If there is a long time series you can also create a vector for .cols beforehand.

Use mutate_at with contains to apply function on multiple columns

Use matches

library(dplyr)
df %>%
mutate_at(vars(matches('a', 'b')), sqrt)

Or specify the match as a character vector as the documentation says

match - A character vector. If length > 1, the union of the matches is taken.

df %>%
mutate_at(vars(contains(match = c('a', 'b'))), sqrt)
ab ba c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

_at/_all are deprecated in favor of across

df %>% 
mutate(across(matches('a', 'b'), sqrt))

-output

        ab       ba  c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

Using functions of multiple columns in a dplyr mutate_at call

This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.

The solution here is to use:

df %>%
mutate_at(.vars = vars(y, z),
.funs = list(~ ifelse(x, ., NA)))

You can also use the new across() function with mutate(), like so:

df %>%
mutate(across(c(y, z), ~ ifelse(x, ., NA)))

The use of the formula operator (as in ~ ifelse(...)) here indicates that ifelse(x, ., NA) is an anonymous function that is being defined within the call to mutate_at().

This works similarly to defining the function outside of the call to mutate_at(), like so:

temp_fn <- function(input) ifelse(test = df[["x"]],
yes = input,
no = NA)

df %>%
mutate_at(.vars = vars(y, z),
.funs = temp_fn)

Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA)), but the funs() function is being deprecated and will soon be removed from dplyr.

Mutate multiple columns using the dplyr framework

You can use pivot_longer to have just one column to modify, which is an alternative to mutate(across()).

You can use case_when to have multiple conditions, so you do not need to nest multiple if statements. The value will be the one of the first true statement.

library(tidyverse)

apcd_hud_ex <- structure(list(studyid = 1:5, SMOKEFREE_DATE = structure(c(
16283,
16283, 16071, 16071, 16648
), class = "Date"), x2014_03_15 = c(
1,
1, 1, 0, 1
), x2014_04_15 = c(1, 1, 1, 1, 1), x2014_05_15 = c(
1,
1, 1, 1, 1
), x2014_06_15 = c(1, 1, 1, 1, 1), x2014_07_15 = c(
1,
1, 1, 1, 1
), x2014_08_15 = c(1, 1, 1, 1, 1), x2014_09_15 = c(
1,
1, 1, 1, 1
), x2014_10_15 = c(1, 1, 1, 1, 1), x2014_11_15 = c(
1,
1, 1, 1, 1
), x2014_12_15 = c(1, 1, 1, 1, 1), x2015_01_15 = c(
1,
1, 1, 1, 1
)), row.names = c(NA, -5L), class = c(
"tbl_df", "tbl",
"data.frame"
))

apcd_hud_ex %>%
pivot_longer(starts_with("x")) %>%
mutate(
insDate = name %>% str_remove("^x") %>% str_replace_all("_", "-") %>% as.Date(),
value = case_when(
value == 0 ~ 0,
insDate < SMOKEFREE_DATE ~ 1,
insDate >= SMOKEFREE_DATE ~ 2
)
) %>%
select(-insDate) %>%
pivot_wider()
#> # A tibble: 5 × 13
#> studyid SMOKEFREE_DATE x2014_03_15 x2014_04_15 x2014_05_15 x2014_06_15
#> <int> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2014-08-01 1 1 1 1
#> 2 2 2014-08-01 1 1 1 1
#> 3 3 2014-01-01 2 2 2 2
#> 4 4 2014-01-01 0 2 2 2
#> 5 5 2015-08-01 1 1 1 1
#> # … with 7 more variables: x2014_07_15 <dbl>, x2014_08_15 <dbl>,
#> # x2014_09_15 <dbl>, x2014_10_15 <dbl>, x2014_11_15 <dbl>, x2014_12_15 <dbl>,
#> # x2015_01_15 <dbl>

Created on 2022-05-05 by the reprex package (v2.0.0)

Dplyr: add multiple columns with mutate/across from character vector

The !! works for a single element

for(nm in add_cols) test <- test %>% 
mutate(!! nm := NA)

-output

> test
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA

Or another option is

test %>% 
bind_cols(setNames(rep(list(NA), length(add_cols)), add_cols))
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA

In base R, this is easier

test[add_cols] <- NA

Which can be used in a pipe

test %>%
`[<-`(., add_cols, value = NA)
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA

across works only if the columns are already present i.e. it is suggesting to loop across the columns present in the data and do some modification/create new columns with .names modification


We could make use add_column from tibble

library(tibble)
library(janitor)
add_column(test, !!! add_cols) %>%
clean_names %>%
mutate(across(all_of(add_cols), ~ NA))
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA

Mutate across multiple columns to create new variable sets

This might be easier in long format, but here's an option you can pursue as wide data.

Using the latest version of dplyr you can mutate across and include .names argument to define how your want your new columns to look.

library(tidyverse)

my_col <- c("var1", "var2", "var3", "var4")

df %>%
group_by(year) %>%
mutate(across(my_col, mean, .names = "mean_{col}")) %>%
mutate(across(my_col, .names = "relmean_{col}") / across(paste0("mean_", my_col)))

Output

   year country  var1  var2  var3  var4 mean_var1 mean_var2 mean_var3 mean_var4 relmean_var1 relmean_var2 relmean_var3 relmean_var4
<int> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1910 GER 1 4 10 6 3 5 9 7.5 0.333 0.8 1.11 0.8
2 1911 GER 2 3 11 7 1.5 3.5 10.5 8 1.33 0.857 1.05 0.875
3 1910 FRA 5 6 8 9 3 5 9 7.5 1.67 1.2 0.889 1.2
4 1911 FRA 1 4 10 9 1.5 3.5 10.5 8 0.667 1.14 0.952 1.12

Put multiple functions into a single dplyr mutate across everything e.g. change mulitple different strings

I guess the tidyverse way would be to chain multiple functions together using the pipe operator. This means you only need to call across once.

df1 %>% 
mutate(across(everything(), ~ str_replace(.,"(?i)bar", "open") %>%
str_replace("(?i)foo", "done")))
x1 x2
1 done done
2 open fix
3 done broke
4 open fix
5 done done
6 open fix
7 done broke
8 open fix

mutate(across) to generate multiple new columns in tidyverse

In this case, you can use cur_data() and cur_column() to take advantage that we are wanting to sum together columns that have the same suffix but just need to swap out the numbers.

library(dplyr)

df <- data.frame(
oldvar1_a = 1:3,
oldvar2_a = 4:6,
oldvar1_i = 7:9,
oldvar2_i = 10:12,
z = c(1,10,20)
)

mutate(
df,
across(
starts_with("oldvar1"),
~ (.x + cur_data()[gsub("1", "2", cur_column())]) - z,
.names = "{col}_new"
)
)
#> oldvar1_a oldvar2_a oldvar1_i oldvar2_i z oldvar2_a oldvar2_i
#> 1 1 4 7 10 1 4 16
#> 2 2 5 8 11 10 -3 9
#> 3 3 6 9 12 20 -11 1

If you want to use with case_when, just make sure to index using [[, you can read more here.

df <- data.frame(
oldvar1_a = 1:3,
oldvar2_a = 4:6,
oldvar1_i = 7:9,
oldvar2_i = 10:12,
z = c(1,2,0)
)

mutate(
df,
across(
starts_with("oldvar1"),
~ case_when(
z == 1 ~ .x,
z == 2 ~ cur_data()[[gsub("1", "2", cur_column())]],
TRUE ~ NA_integer_
),
.names = "{col}_new"
)
)
#> oldvar1_a oldvar2_a oldvar1_i oldvar2_i z oldvar1_a_new oldvar1_i_new
#> 1 1 4 7 10 1 1 7
#> 2 2 5 8 11 2 5 11
#> 3 3 6 9 12 0 NA NA


Related Topics



Leave a reply



Submit