Apply a Function to Several Columns at Once with Mutate

apply a function to several columns at once with mutate

I think it was nearly there. I've used mutate_at (I think mutate_each is deprecated) and included the variable names inside vars:

library(dplyr)
x %>% mutate_at(vars(ADR:star_rating), funs(stringr::str_replace_all(., ",", "")))
#>      X.     ADR hotel_id city_id star_rating accommodation_type_name
#> 1 4,084 1099.69  2313076    9395           5                   Hotel
#> 2 4,084   68.66   583666   17193           2                Bungalow
#> 3 4,084  232.72  1251372    5085           3                   Hotel
#> 4 4,084  195.66  1545890   16808           4                   Hotel
#> 5 4,084      98   298160    8584           4                   Hotel
#>   chain_hotel booking_date checkin_date checkout_date city
#> 1       chain    10/5/2016   10/27/2016    10/30/2016    A
#> 2   non-chain    12/4/2016    12/9/2016    12/12/2016    B
#> 3   non-chain    11/6/2016   11/18/2016    11/20/2016    C
#> 4   non-chain   10/22/2016    11/3/2016     11/4/2016    D
#> 5   non-chain   12/11/2016   12/11/2016    12/12/2016    E

How to mutate multiple columns as function of multiple columns systematically?

You can do something like this with mutate(across..., however, for renaming columns there must be a shortcut.

df %>% 
  mutate(across(.cols = c(varA.t1, varA.t2),
                .fns = ~ .x / get(glue::glue(str_replace(cur_column(), "varA", "varB"))),
                .names = "V_{.col}")) %>%
  rename_with(~str_replace(., "V_varA", "varC"), starts_with("V_"))

# A tibble: 2 x 7
  id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 row_1       5      10       2       4     2.5    2.5 
2 row_2      20      50       4       6     5      8.33

If there is a long time series you can also create a vector for .cols beforehand.

Use mutate_at with contains to apply function on multiple columns

Use matches

library(dplyr)
df %>%
   mutate_at(vars(matches('a', 'b')), sqrt)

Or specify the match as a character vector as the documentation says

match - A character vector. If length > 1, the union of the matches is taken.

df %>%
    mutate_at(vars(contains(match = c('a', 'b'))), sqrt)
        ab       ba  c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

_at/_all are deprecated in favor of across

df %>% 
   mutate(across(matches('a', 'b'), sqrt))

-output

        ab       ba  c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

Using functions of multiple columns in a dplyr mutate_at call

This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.

The solution here is to use:

df %>%
   mutate_at(.vars = vars(y, z),
             .funs = list(~ ifelse(x, ., NA)))

You can also use the new across() function with mutate(), like so:

df %>%
   mutate(across(c(y, z), ~ ifelse(x, ., NA)))

The use of the formula operator (as in ~ ifelse(...)) here indicates that ifelse(x, ., NA) is an anonymous function that is being defined within the call to mutate_at().

This works similarly to defining the function outside of the call to mutate_at(), like so:

temp_fn <- function(input) ifelse(test = df[["x"]],
                                  yes = input,
                                  no = NA)

df %>%
   mutate_at(.vars = vars(y, z),
             .funs = temp_fn)

Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA)), but the funs() function is being deprecated and will soon be removed from dplyr.

Mutate multiple columns using the dplyr framework

You can use pivot_longer to have just one column to modify, which is an alternative to mutate(across()).

You can use case_when to have multiple conditions, so you do not need to nest multiple if statements. The value will be the one of the first true statement.

library(tidyverse)

apcd_hud_ex <- structure(list(studyid = 1:5, SMOKEFREE_DATE = structure(c(
  16283,
  16283, 16071, 16071, 16648
), class = "Date"), x2014_03_15 = c(
  1,
  1, 1, 0, 1
), x2014_04_15 = c(1, 1, 1, 1, 1), x2014_05_15 = c(
  1,
  1, 1, 1, 1
), x2014_06_15 = c(1, 1, 1, 1, 1), x2014_07_15 = c(
  1,
  1, 1, 1, 1
), x2014_08_15 = c(1, 1, 1, 1, 1), x2014_09_15 = c(
  1,
  1, 1, 1, 1
), x2014_10_15 = c(1, 1, 1, 1, 1), x2014_11_15 = c(
  1,
  1, 1, 1, 1
), x2014_12_15 = c(1, 1, 1, 1, 1), x2015_01_15 = c(
  1,
  1, 1, 1, 1
)), row.names = c(NA, -5L), class = c(
  "tbl_df", "tbl",
  "data.frame"
))

apcd_hud_ex %>%
  pivot_longer(starts_with("x")) %>%
  mutate(
    insDate = name %>% str_remove("^x") %>% str_replace_all("_", "-") %>% as.Date(),
    value = case_when(
      value == 0 ~ 0,
      insDate < SMOKEFREE_DATE ~ 1,
      insDate >= SMOKEFREE_DATE ~ 2
    )
  ) %>%
  select(-insDate) %>%
  pivot_wider()
#> # A tibble: 5 × 13
#>   studyid SMOKEFREE_DATE x2014_03_15 x2014_04_15 x2014_05_15 x2014_06_15
#>     <int> <date>               <dbl>       <dbl>       <dbl>       <dbl>
#> 1       1 2014-08-01               1           1           1           1
#> 2       2 2014-08-01               1           1           1           1
#> 3       3 2014-01-01               2           2           2           2
#> 4       4 2014-01-01               0           2           2           2
#> 5       5 2015-08-01               1           1           1           1
#> # … with 7 more variables: x2014_07_15 <dbl>, x2014_08_15 <dbl>,
#> #   x2014_09_15 <dbl>, x2014_10_15 <dbl>, x2014_11_15 <dbl>, x2014_12_15 <dbl>,
#> #   x2015_01_15 <dbl>

^{Created on 2022-05-05 by the reprex package (v2.0.0)}

Dplyr: add multiple columns with mutate/across from character vector

The !! works for a single element

for(nm in add_cols) test <- test %>% 
         mutate(!! nm := NA)

-output

> test
  a col_1 col_2
1 1    NA    NA
2 2    NA    NA
3 3    NA    NA

Or another option is

test %>% 
   bind_cols(setNames(rep(list(NA), length(add_cols)), add_cols))
  a col_1 col_2
1 1    NA    NA
2 2    NA    NA
3 3    NA    NA

In base R, this is easier

test[add_cols] <- NA

Which can be used in a pipe

test %>%
  `[<-`(., add_cols, value = NA)
  a col_1 col_2
1 1    NA    NA
2 2    NA    NA
3 3    NA    NA

across works only if the columns are already present i.e. it is suggesting to loop across the columns present in the data and do some modification/create new columns with .names modification

We could make use add_column from tibble

library(tibble)
library(janitor)
add_column(test, !!! add_cols) %>% 
   clean_names %>% 
   mutate(across(all_of(add_cols), ~ NA))
  a col_1 col_2
1 1    NA    NA
2 2    NA    NA
3 3    NA    NA

Mutate across multiple columns to create new variable sets

This might be easier in long format, but here's an option you can pursue as wide data.

Using the latest version of dplyr you can mutate across and include .names argument to define how your want your new columns to look.

library(tidyverse)

my_col <- c("var1", "var2", "var3", "var4")

df %>%
  group_by(year) %>%
  mutate(across(my_col, mean, .names = "mean_{col}")) %>%
  mutate(across(my_col, .names = "relmean_{col}") / across(paste0("mean_", my_col)))

Output

   year country  var1  var2  var3  var4 mean_var1 mean_var2 mean_var3 mean_var4 relmean_var1 relmean_var2 relmean_var3 relmean_var4
  <int> <chr>   <int> <int> <int> <int>     <dbl>     <dbl>     <dbl>     <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1  1910 GER         1     4    10     6       3         5         9         7.5        0.333        0.8          1.11         0.8  
2  1911 GER         2     3    11     7       1.5       3.5      10.5       8          1.33         0.857        1.05         0.875
3  1910 FRA         5     6     8     9       3         5         9         7.5        1.67         1.2          0.889        1.2  
4  1911 FRA         1     4    10     9       1.5       3.5      10.5       8          0.667        1.14         0.952        1.12

Put multiple functions into a single dplyr mutate across everything e.g. change mulitple different strings

I guess the tidyverse way would be to chain multiple functions together using the pipe operator. This means you only need to call across once.

df1 %>% 
   mutate(across(everything(), ~ str_replace(.,"(?i)bar", "open") %>% 
                                 str_replace("(?i)foo", "done")))
    x1    x2
1 done  done
2 open   fix
3 done broke
4 open   fix
5 done  done
6 open   fix
7 done broke
8 open   fix

mutate(across) to generate multiple new columns in tidyverse

In this case, you can use cur_data() and cur_column() to take advantage that we are wanting to sum together columns that have the same suffix but just need to swap out the numbers.

library(dplyr)

df <- data.frame(
  oldvar1_a = 1:3,
  oldvar2_a = 4:6,
  oldvar1_i = 7:9,
  oldvar2_i = 10:12,
  z = c(1,10,20)
)

mutate(
  df,
  across(
    starts_with("oldvar1"),
    ~ (.x + cur_data()[gsub("1", "2", cur_column())]) - z,
    .names = "{col}_new"
  )
)
#>   oldvar1_a oldvar2_a oldvar1_i oldvar2_i  z oldvar2_a oldvar2_i
#> 1         1         4         7        10  1         4        16
#> 2         2         5         8        11 10        -3         9
#> 3         3         6         9        12 20       -11         1

If you want to use with case_when, just make sure to index using [[, you can read more here.

df <- data.frame(
  oldvar1_a = 1:3,
  oldvar2_a = 4:6,
  oldvar1_i = 7:9,
  oldvar2_i = 10:12,
  z = c(1,2,0)
)

mutate(
  df,
  across(
    starts_with("oldvar1"),
    ~ case_when(
      z == 1 ~ .x,
      z == 2 ~ cur_data()[[gsub("1", "2", cur_column())]],
      TRUE ~ NA_integer_
    ),
    .names = "{col}_new"
  )
)
#>   oldvar1_a oldvar2_a oldvar1_i oldvar2_i z oldvar1_a_new oldvar1_i_new
#> 1         1         4         7        10 1             1             7
#> 2         2         5         8        11 2             5            11
#> 3         3         6         9        12 0            NA            NA

Apply a Function to Several Columns at Once with Mutate