Using Functions of Multiple Columns in a Dplyr Mutate_At Call

Using functions of multiple columns in a dplyr mutate_at call

This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.

The solution here is to use:

df %>%
   mutate_at(.vars = vars(y, z),
             .funs = list(~ ifelse(x, ., NA)))

You can also use the new across() function with mutate(), like so:

df %>%
   mutate(across(c(y, z), ~ ifelse(x, ., NA)))

The use of the formula operator (as in ~ ifelse(...)) here indicates that ifelse(x, ., NA) is an anonymous function that is being defined within the call to mutate_at().

This works similarly to defining the function outside of the call to mutate_at(), like so:

temp_fn <- function(input) ifelse(test = df[["x"]],
                                  yes = input,
                                  no = NA)

df %>%
   mutate_at(.vars = vars(y, z),
             .funs = temp_fn)

Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA)), but the funs() function is being deprecated and will soon be removed from dplyr.

How to mutate_at multiple columns on a condition on each value?

The FUN is not an argument in mutate_at. In the new version, the earlier used fun is deprecated with list(~ or simply ~. Also, wrap the columns to select in vars. It can also be unquoted or use vars(starts_with("H")) or vars(matches("^H\\d+$")). Also, replace the 'Hour' with .

library(dplyr)
df %>%
    mutate_at(vars(c("H1", "H2", "H3")), ~ifelse( . < 0, 
           .*total_neg/10, .*total_pos/10))
#. ID Date total_neg total_pos H1 H2 H3
#1  1    1         1         4  2  2 -1
#2  1    2         1         5  2 -1  3
#3  1    3         0         2  0  1  1
#4  2    1         0         4  2  2  0
#5  2    2         2         5 -1 -1  5

Use mutate_at with contains to apply function on multiple columns

Use matches

library(dplyr)
df %>%
   mutate_at(vars(matches('a', 'b')), sqrt)

Or specify the match as a character vector as the documentation says

match - A character vector. If length > 1, the union of the matches is taken.

df %>%
    mutate_at(vars(contains(match = c('a', 'b'))), sqrt)
        ab       ba  c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

_at/_all are deprecated in favor of across

df %>% 
   mutate(across(matches('a', 'b'), sqrt))

-output

        ab       ba  c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

dplyr mutate_at function applied to multiple columns - using dynamic column names

You can try using as.name before applying !!:

x %>% mutate_at(vars(one_of(numerator_vars)), funs(share = 
  ifelse(!!(as.name(denominator_var)), round(./!!(as.name(denominator_var))) * 100, 2)))

#    a b c  d b_share c_share d_share
# 1 10 1 1  8       0       0     100
# 2 20 2 2 16       0       0     100
# 3 30 3 3 24       0       0     100
# 4  0 0 0  0       2       2       2

Using mutate_at, starts_with and case_when with list to change values of multiple columns

You didn't give an expected output, but I believe this is what you want.

We should use across() now like this, as mutate_at is deprecated. across takes two arguments; one to select the columns and the second is the function which should be applied to those columns. Also, no need to wrap the case_when statement in list.

df %>%
  mutate(
    across(starts_with('q11'), 
    function(x) case_when(
      x == 1 ~ "Yes", 
      x == 2 ~ "No", 
      x == 3 ~ "Maybe", 
      x == 4 ~ "Don't know"
    )
  )
)

        q11_1      q11_2 q12_1 q12_2
1          No         No     4     1
2          No        Yes     4     2
3         Yes Don't know     1     2
4  Don't know Don't know     1     2
5  Don't know        Yes     3     1
6          No      Maybe     4     1
7         Yes         No     3     3
8  Don't know      Maybe     3     3
9       Maybe      Maybe     3     4
10 Don't know Don't know     4     1
11      Maybe         No     1     2
12      Maybe      Maybe     1     2
13      Maybe Don't know     4     1
14        Yes         No     3     4
15 Don't know        Yes     2     3
16      Maybe         No     3     4
17        Yes         No     4     1
18        Yes Don't know     3     4
19      Maybe         No     3     2
20      Maybe      Maybe     1     3
21        Yes      Maybe     2     2
22        Yes      Maybe     1     3
23 Don't know      Maybe     1     2
24      Maybe        Yes     2     4
25         No        Yes     1     3

Using dplyr mutate_at when a function takes multiple arguments which are different columns

Old question, but I agree with Jesse that you need to tidy your data a bit. gather would be the way to go, but it lacks somehow the possibility of stats::reshape where you can specify groups of columns to gather. So here's a solution with reshape:

df %>% 
   reshape(varying   = list(c("x_1", "y_1"), c("x_2", "y_2")), 
           times     = c("x", "y"),
           direction = "long") %>% 
   mutate(x = ifelse(is.na(x_1), x_2, x_1)) %>% 
   reshape(idvar     = "id", 
           timevar   = "time",
           direction = "wide") %>% 
   rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
#   id x_1 x_2 x y_1 y_2 y
# 1  1   1   1 1  NA   5 5
# 2  2  NA   2 2   2   6 2
# 3  3   3   4 3   1   7 1

In order to do that with any number of column pairs, you could do something like:

df2 <- setNames(cbind(df, df), c(t(outer(letters[23:26], 1:2, paste, sep = "_"))))
v <- split(names(df2), purrr::map_chr(names(df2), ~ gsub(".*_(.*)", "\\1", .)))
n <- unique(purrr::map_chr(names(df2), ~ gsub("_[0-9]+", "", .) ))
df2 %>% 
    reshape(varying   = v, 
            times     = n,
            direction = "long") %>% 
     mutate(x = ifelse(is.na(!!sym(v[[1]][1])), !!sym(v[[2]][1]), !!sym(v[[1]][1]))) %>% 
     reshape(idvar     = "id", 
             timevar   = "time",
             direction = "wide") %>% 
     rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
#   id w_1 w_2 w x_1 x_2 x y_1 y_2 y z_1 z_2 z
# 1  1   1   1 1  NA   5 5   1   1 1  NA   5 5
# 2  2  NA   2 2   2   6 2  NA   2 2   2   6 2
# 3  3   3   4 3   1   7 1   3   4 3   1   7 1

This assumes that columns which should be compared are next to each other and that all columns for with possible NA values are in columns suffixed by _1 and the replacement value columns are sufficed by _2.

dplyr::mutate_at() relying on multiple columns with a given prefix/suffix

After adopting @akrun's elegant solution, I noticed it was unfortunately very inefficient (since it has to recreate two dataframes), taking almost a second on a dataset with 20,000 rows and 11 "groups".

So a while ago I developed the following function (with a bit of help from @user12728748... sorry for not posting here sooner), which takes the names of the groups ("data1", "data2", etc) and a formula using the prefixes, allowing for bquote-style quoting for constant names:

suppressPackageStartupMessages(library(dplyr))

mutateSet <- function(df, colNames, formula,
                      isPrefix = TRUE,
                      separator = "_") {
  vars <- all.vars(formula)

  # extracts names wrapped in `.()`
  escapedNames <- function (expr)
  {
    unquote <- function(e) {
      if (is.pairlist(e) || length(e) <= 1L) NULL
      else if (e[[1L]] == as.name("."))      deparse(e[[2L]])
      else                                   unlist(sapply(e, unquote))
    }
    unquote(substitute(expr))
  }

  escapedVars <- eval(rlang::expr(escapedNames(!!formula)))

  # remove escaped names from mapping variables
  vars <- setdiff(vars, escapedVars)

  # get output prefix/suffix as string
  lhs <- rlang::f_lhs(formula) %>%
    all.vars()

  # get operation as string
  # deparse() can have line breaks; paste0() brings it back to one line
  rhs <- rlang::f_rhs(formula) %>%
    deparse() %>%
    paste0(collapse = "")

  # dummy function to cover for bquote escaping
  . <- function(x) x

  for (i in colNames) {
    if (isPrefix) {
      aliases <- paste0(vars, separator, i)
      newCol  <- paste0(lhs,  separator, i)
    } else {
      aliases <- paste0(i, separator, vars)
      newCol  <- paste0(i, separator, lhs)
    }

    if (length(lhs) == 0) newCol <- i

    mapping <- rlang::list2(!!!aliases)
    names(mapping) <- vars
    mapping <- do.call(wrapr::qc, mapping)

    df <- rlang::expr(wrapr::let(
      mapping,
      df %>% dplyr::mutate(!!newCol := ...RHS...)
    )) %>%
      deparse() %>%
      gsub(
        pattern = "...RHS...",
        replacement = rhs
      ) %>%
      {eval(parse(text = .))}
  }

  return(df)
}

df <- data.frame(a_data1 = 1:3, b_data1 = 2:4,
                a_data2 = 3:5, b_data2 = 4:6,
                static = 5:7)

mutateSet(df, "data1", ~ a + b)
#>   a_data1 b_data1 a_data2 b_data2 static data1
#> 1       1       2       3       4      5     3
#> 2       2       3       4       5      6     5
#> 3       3       4       5       6      7     7
mutateSet(df, c("data1", "data2"), x ~ sqrt(a) + b)
#>   a_data1 b_data1 a_data2 b_data2 static  x_data1  x_data2
#> 1       1       2       3       4      5 3.000000 5.732051
#> 2       2       3       4       5      6 4.414214 7.000000
#> 3       3       4       5       6      7 5.732051 8.236068
mutateSet(df, c("data1", "data2"), ~ a + b + .(static))
#>   a_data1 b_data1 a_data2 b_data2 static data1 data2
#> 1       1       2       3       4      5     8    12
#> 2       2       3       4       5      6    11    15
#> 3       3       4       5       6      7    14    18

^{Created on 2020-04-28 by the reprex package (v0.3.0)}

This can probably be cleaned up (especially that heinous for-loop), but it works for now.

Repeating @user12728748's performance test, we see this is ~100x faster:

suppressPackageStartupMessages({
    invisible(lapply(c("dplyr", "tidyr", "rlang", "wrapr", "microbenchmark"),
                     require, character.only = TRUE))
})

polymutate <- function(df, formula) {
    form <- rlang::f_rhs(formula)

    df %>% 
        mutate(rn = row_number()) %>%
        pivot_longer(cols = -rn, names_to = c('.value', 'grp'),
                     names_sep = "_") %>%
        group_by(grp) %>%
        transmute(rn, new = eval(form)) %>%
        pivot_wider(names_from = grp, values_from = new) %>%
        select(-rn) %>%
        bind_cols(df, .)
}

set.seed(1)                 
df <- setNames(data.frame(matrix(sample(1:12, 6E6, replace=TRUE), ncol=6)),
               c("a_data1", "b_data1", "a_data2", "b_data2", "a_data3", "b_data3"))

pd <- polymutate(df, ~ a + b)
pd2 <- mutateSet(df, c("data1", "data2", "data3"), ~ a + b)

all.equal(pd, pd2)
#> [1] TRUE

microbenchmark(polymutate(df, ~ a + b), 
               mutateSet(df, c("data1", "data2", "data3"), ~ a + b),
               times=10L)
#> Unit: milliseconds
#>        expr      min        lq       mean     median        uq       max neval
#>  polymutate 1612.306 1628.9776 1690.78586 1670.15600 1741.3490 1806.1412    10
#>  mutateSet     8.757    9.6302   13.27135   10.45965   19.2976   20.4657    10

Using Functions of Multiple Columns in a Dplyr Mutate_At Call