Using Functions of Multiple Columns in a Dplyr Mutate_At Call

Using functions of multiple columns in a dplyr mutate_at call

This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.

The solution here is to use:

df %>%
mutate_at(.vars = vars(y, z),
.funs = list(~ ifelse(x, ., NA)))

You can also use the new across() function with mutate(), like so:

df %>%
mutate(across(c(y, z), ~ ifelse(x, ., NA)))

The use of the formula operator (as in ~ ifelse(...)) here indicates that ifelse(x, ., NA) is an anonymous function that is being defined within the call to mutate_at().

This works similarly to defining the function outside of the call to mutate_at(), like so:

temp_fn <- function(input) ifelse(test = df[["x"]],
yes = input,
no = NA)

df %>%
mutate_at(.vars = vars(y, z),
.funs = temp_fn)

Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA)), but the funs() function is being deprecated and will soon be removed from dplyr.

How to mutate_at multiple columns on a condition on each value?

The FUN is not an argument in mutate_at. In the new version, the earlier used fun is deprecated with list(~ or simply ~. Also, wrap the columns to select in vars. It can also be unquoted or use vars(starts_with("H")) or vars(matches("^H\\d+$")). Also, replace the 'Hour' with .

library(dplyr)
df %>%
mutate_at(vars(c("H1", "H2", "H3")), ~ifelse( . < 0,
.*total_neg/10, .*total_pos/10))
#. ID Date total_neg total_pos H1 H2 H3
#1 1 1 1 4 2 2 -1
#2 1 2 1 5 2 -1 3
#3 1 3 0 2 0 1 1
#4 2 1 0 4 2 2 0
#5 2 2 2 5 -1 -1 5

Use mutate_at with contains to apply function on multiple columns

Use matches

library(dplyr)
df %>%
mutate_at(vars(matches('a', 'b')), sqrt)

Or specify the match as a character vector as the documentation says

match - A character vector. If length > 1, the union of the matches is taken.

df %>%
mutate_at(vars(contains(match = c('a', 'b'))), sqrt)
ab ba c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

_at/_all are deprecated in favor of across

df %>% 
mutate(across(matches('a', 'b'), sqrt))

-output

        ab       ba  c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20

dplyr mutate_at function applied to multiple columns - using dynamic column names

You can try using as.name before applying !!:

x %>% mutate_at(vars(one_of(numerator_vars)), funs(share = 
ifelse(!!(as.name(denominator_var)), round(./!!(as.name(denominator_var))) * 100, 2)))

# a b c d b_share c_share d_share
# 1 10 1 1 8 0 0 100
# 2 20 2 2 16 0 0 100
# 3 30 3 3 24 0 0 100
# 4 0 0 0 0 2 2 2

Using mutate_at, starts_with and case_when with list to change values of multiple columns

You didn't give an expected output, but I believe this is what you want.

We should use across() now like this, as mutate_at is deprecated. across takes two arguments; one to select the columns and the second is the function which should be applied to those columns. Also, no need to wrap the case_when statement in list.

df %>%
mutate(
across(starts_with('q11'),
function(x) case_when(
x == 1 ~ "Yes",
x == 2 ~ "No",
x == 3 ~ "Maybe",
x == 4 ~ "Don't know"
)
)
)
        q11_1      q11_2 q12_1 q12_2
1 No No 4 1
2 No Yes 4 2
3 Yes Don't know 1 2
4 Don't know Don't know 1 2
5 Don't know Yes 3 1
6 No Maybe 4 1
7 Yes No 3 3
8 Don't know Maybe 3 3
9 Maybe Maybe 3 4
10 Don't know Don't know 4 1
11 Maybe No 1 2
12 Maybe Maybe 1 2
13 Maybe Don't know 4 1
14 Yes No 3 4
15 Don't know Yes 2 3
16 Maybe No 3 4
17 Yes No 4 1
18 Yes Don't know 3 4
19 Maybe No 3 2
20 Maybe Maybe 1 3
21 Yes Maybe 2 2
22 Yes Maybe 1 3
23 Don't know Maybe 1 2
24 Maybe Yes 2 4
25 No Yes 1 3

Using dplyr mutate_at when a function takes multiple arguments which are different columns

Old question, but I agree with Jesse that you need to tidy your data a bit. gather would be the way to go, but it lacks somehow the possibility of stats::reshape where you can specify groups of columns to gather. So here's a solution with reshape:

df %>% 
reshape(varying = list(c("x_1", "y_1"), c("x_2", "y_2")),
times = c("x", "y"),
direction = "long") %>%
mutate(x = ifelse(is.na(x_1), x_2, x_1)) %>%
reshape(idvar = "id",
timevar = "time",
direction = "wide") %>%
rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
# id x_1 x_2 x y_1 y_2 y
# 1 1 1 1 1 NA 5 5
# 2 2 NA 2 2 2 6 2
# 3 3 3 4 3 1 7 1

In order to do that with any number of column pairs, you could do something like:

df2 <- setNames(cbind(df, df), c(t(outer(letters[23:26], 1:2, paste, sep = "_"))))
v <- split(names(df2), purrr::map_chr(names(df2), ~ gsub(".*_(.*)", "\\1", .)))
n <- unique(purrr::map_chr(names(df2), ~ gsub("_[0-9]+", "", .) ))
df2 %>%
reshape(varying = v,
times = n,
direction = "long") %>%
mutate(x = ifelse(is.na(!!sym(v[[1]][1])), !!sym(v[[2]][1]), !!sym(v[[1]][1]))) %>%
reshape(idvar = "id",
timevar = "time",
direction = "wide") %>%
rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
# id w_1 w_2 w x_1 x_2 x y_1 y_2 y z_1 z_2 z
# 1 1 1 1 1 NA 5 5 1 1 1 NA 5 5
# 2 2 NA 2 2 2 6 2 NA 2 2 2 6 2
# 3 3 3 4 3 1 7 1 3 4 3 1 7 1

This assumes that columns which should be compared are next to each other and that all columns for with possible NA values are in columns suffixed by _1 and the replacement value columns are sufficed by _2.

dplyr::mutate_at() relying on multiple columns with a given prefix/suffix

After adopting @akrun's elegant solution, I noticed it was unfortunately very inefficient (since it has to recreate two dataframes), taking almost a second on a dataset with 20,000 rows and 11 "groups".

So a while ago I developed the following function (with a bit of help from @user12728748... sorry for not posting here sooner), which takes the names of the groups ("data1", "data2", etc) and a formula using the prefixes, allowing for bquote-style quoting for constant names:

suppressPackageStartupMessages(library(dplyr))

mutateSet <- function(df, colNames, formula,
isPrefix = TRUE,
separator = "_") {
vars <- all.vars(formula)

# extracts names wrapped in `.()`
escapedNames <- function (expr)
{
unquote <- function(e) {
if (is.pairlist(e) || length(e) <= 1L) NULL
else if (e[[1L]] == as.name(".")) deparse(e[[2L]])
else unlist(sapply(e, unquote))
}
unquote(substitute(expr))
}

escapedVars <- eval(rlang::expr(escapedNames(!!formula)))

# remove escaped names from mapping variables
vars <- setdiff(vars, escapedVars)

# get output prefix/suffix as string
lhs <- rlang::f_lhs(formula) %>%
all.vars()

# get operation as string
# deparse() can have line breaks; paste0() brings it back to one line
rhs <- rlang::f_rhs(formula) %>%
deparse() %>%
paste0(collapse = "")

# dummy function to cover for bquote escaping
. <- function(x) x

for (i in colNames) {
if (isPrefix) {
aliases <- paste0(vars, separator, i)
newCol <- paste0(lhs, separator, i)
} else {
aliases <- paste0(i, separator, vars)
newCol <- paste0(i, separator, lhs)
}

if (length(lhs) == 0) newCol <- i

mapping <- rlang::list2(!!!aliases)
names(mapping) <- vars
mapping <- do.call(wrapr::qc, mapping)

df <- rlang::expr(wrapr::let(
mapping,
df %>% dplyr::mutate(!!newCol := ...RHS...)
)) %>%
deparse() %>%
gsub(
pattern = "...RHS...",
replacement = rhs
) %>%
{eval(parse(text = .))}
}

return(df)
}

df <- data.frame(a_data1 = 1:3, b_data1 = 2:4,
a_data2 = 3:5, b_data2 = 4:6,
static = 5:7)

mutateSet(df, "data1", ~ a + b)
#> a_data1 b_data1 a_data2 b_data2 static data1
#> 1 1 2 3 4 5 3
#> 2 2 3 4 5 6 5
#> 3 3 4 5 6 7 7
mutateSet(df, c("data1", "data2"), x ~ sqrt(a) + b)
#> a_data1 b_data1 a_data2 b_data2 static x_data1 x_data2
#> 1 1 2 3 4 5 3.000000 5.732051
#> 2 2 3 4 5 6 4.414214 7.000000
#> 3 3 4 5 6 7 5.732051 8.236068
mutateSet(df, c("data1", "data2"), ~ a + b + .(static))
#> a_data1 b_data1 a_data2 b_data2 static data1 data2
#> 1 1 2 3 4 5 8 12
#> 2 2 3 4 5 6 11 15
#> 3 3 4 5 6 7 14 18

Created on 2020-04-28 by the reprex package (v0.3.0)

This can probably be cleaned up (especially that heinous for-loop), but it works for now.

Repeating @user12728748's performance test, we see this is ~100x faster:

suppressPackageStartupMessages({
invisible(lapply(c("dplyr", "tidyr", "rlang", "wrapr", "microbenchmark"),
require, character.only = TRUE))
})

polymutate <- function(df, formula) {
form <- rlang::f_rhs(formula)

df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_to = c('.value', 'grp'),
names_sep = "_") %>%
group_by(grp) %>%
transmute(rn, new = eval(form)) %>%
pivot_wider(names_from = grp, values_from = new) %>%
select(-rn) %>%
bind_cols(df, .)
}

set.seed(1)
df <- setNames(data.frame(matrix(sample(1:12, 6E6, replace=TRUE), ncol=6)),
c("a_data1", "b_data1", "a_data2", "b_data2", "a_data3", "b_data3"))

pd <- polymutate(df, ~ a + b)
pd2 <- mutateSet(df, c("data1", "data2", "data3"), ~ a + b)

all.equal(pd, pd2)
#> [1] TRUE

microbenchmark(polymutate(df, ~ a + b),
mutateSet(df, c("data1", "data2", "data3"), ~ a + b),
times=10L)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> polymutate 1612.306 1628.9776 1690.78586 1670.15600 1741.3490 1806.1412 10
#> mutateSet 8.757 9.6302 13.27135 10.45965 19.2976 20.4657 10


Related Topics



Leave a reply



Submit