Using functions of multiple columns in a dplyr mutate_at call
This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.
The solution here is to use:
df %>%
mutate_at(.vars = vars(y, z),
.funs = list(~ ifelse(x, ., NA)))
You can also use the new across()
function with mutate()
, like so:
df %>%
mutate(across(c(y, z), ~ ifelse(x, ., NA)))
The use of the formula operator (as in ~ ifelse(...)
) here indicates that ifelse(x, ., NA)
is an anonymous function that is being defined within the call to mutate_at()
.
This works similarly to defining the function outside of the call to mutate_at()
, like so:
temp_fn <- function(input) ifelse(test = df[["x"]],
yes = input,
no = NA)
df %>%
mutate_at(.vars = vars(y, z),
.funs = temp_fn)
Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA))
, but the funs()
function is being deprecated and will soon be removed from dplyr.
How to mutate_at multiple columns on a condition on each value?
The FUN
is not an argument in mutate_at
. In the new version, the earlier used fun
is deprecated with list(~
or simply ~
. Also, wrap the columns to select in vars
. It can also be unquoted or use vars(starts_with("H"))
or vars(matches("^H\\d+$"))
. Also, replace the 'Hour' with .
library(dplyr)
df %>%
mutate_at(vars(c("H1", "H2", "H3")), ~ifelse( . < 0,
.*total_neg/10, .*total_pos/10))
#. ID Date total_neg total_pos H1 H2 H3
#1 1 1 1 4 2 2 -1
#2 1 2 1 5 2 -1 3
#3 1 3 0 2 0 1 1
#4 2 1 0 4 2 2 0
#5 2 2 2 5 -1 -1 5
Use mutate_at with contains to apply function on multiple columns
Use matches
library(dplyr)
df %>%
mutate_at(vars(matches('a', 'b')), sqrt)
Or specify the match
as a character vector as the documentation says
match - A character vector. If length > 1, the union of the matches is taken.
df %>%
mutate_at(vars(contains(match = c('a', 'b'))), sqrt)
ab ba c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20
_at/_all
are deprecated in favor of across
df %>%
mutate(across(matches('a', 'b'), sqrt))
-output
ab ba c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20
dplyr mutate_at function applied to multiple columns - using dynamic column names
You can try using as.name
before applying !!
:
x %>% mutate_at(vars(one_of(numerator_vars)), funs(share =
ifelse(!!(as.name(denominator_var)), round(./!!(as.name(denominator_var))) * 100, 2)))
# a b c d b_share c_share d_share
# 1 10 1 1 8 0 0 100
# 2 20 2 2 16 0 0 100
# 3 30 3 3 24 0 0 100
# 4 0 0 0 0 2 2 2
Using mutate_at, starts_with and case_when with list to change values of multiple columns
You didn't give an expected output, but I believe this is what you want.
We should use across()
now like this, as mutate_at
is deprecated. across
takes two arguments; one to select the columns and the second is the function which should be applied to those columns. Also, no need to wrap the case_when
statement in list
.
df %>%
mutate(
across(starts_with('q11'),
function(x) case_when(
x == 1 ~ "Yes",
x == 2 ~ "No",
x == 3 ~ "Maybe",
x == 4 ~ "Don't know"
)
)
)
q11_1 q11_2 q12_1 q12_2
1 No No 4 1
2 No Yes 4 2
3 Yes Don't know 1 2
4 Don't know Don't know 1 2
5 Don't know Yes 3 1
6 No Maybe 4 1
7 Yes No 3 3
8 Don't know Maybe 3 3
9 Maybe Maybe 3 4
10 Don't know Don't know 4 1
11 Maybe No 1 2
12 Maybe Maybe 1 2
13 Maybe Don't know 4 1
14 Yes No 3 4
15 Don't know Yes 2 3
16 Maybe No 3 4
17 Yes No 4 1
18 Yes Don't know 3 4
19 Maybe No 3 2
20 Maybe Maybe 1 3
21 Yes Maybe 2 2
22 Yes Maybe 1 3
23 Don't know Maybe 1 2
24 Maybe Yes 2 4
25 No Yes 1 3
Using dplyr mutate_at when a function takes multiple arguments which are different columns
Old question, but I agree with Jesse that you need to tidy your data a bit. gather
would be the way to go, but it lacks somehow the possibility of stats::reshape
where you can specify groups of columns to gather. So here's a solution with reshape
:
df %>%
reshape(varying = list(c("x_1", "y_1"), c("x_2", "y_2")),
times = c("x", "y"),
direction = "long") %>%
mutate(x = ifelse(is.na(x_1), x_2, x_1)) %>%
reshape(idvar = "id",
timevar = "time",
direction = "wide") %>%
rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
# id x_1 x_2 x y_1 y_2 y
# 1 1 1 1 1 NA 5 5
# 2 2 NA 2 2 2 6 2
# 3 3 3 4 3 1 7 1
In order to do that with any number of column pairs, you could do something like:
df2 <- setNames(cbind(df, df), c(t(outer(letters[23:26], 1:2, paste, sep = "_"))))
v <- split(names(df2), purrr::map_chr(names(df2), ~ gsub(".*_(.*)", "\\1", .)))
n <- unique(purrr::map_chr(names(df2), ~ gsub("_[0-9]+", "", .) ))
df2 %>%
reshape(varying = v,
times = n,
direction = "long") %>%
mutate(x = ifelse(is.na(!!sym(v[[1]][1])), !!sym(v[[2]][1]), !!sym(v[[1]][1]))) %>%
reshape(idvar = "id",
timevar = "time",
direction = "wide") %>%
rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
# id w_1 w_2 w x_1 x_2 x y_1 y_2 y z_1 z_2 z
# 1 1 1 1 1 NA 5 5 1 1 1 NA 5 5
# 2 2 NA 2 2 2 6 2 NA 2 2 2 6 2
# 3 3 3 4 3 1 7 1 3 4 3 1 7 1
This assumes that columns which should be compared are next to each other and that all columns for with possible NA values are in columns suffixed by _1
and the replacement value columns are sufficed by _2
.
dplyr::mutate_at() relying on multiple columns with a given prefix/suffix
After adopting @akrun's elegant solution, I noticed it was unfortunately very inefficient (since it has to recreate two dataframes), taking almost a second on a dataset with 20,000 rows and 11 "groups".
So a while ago I developed the following function (with a bit of help from @user12728748... sorry for not posting here sooner), which takes the names of the groups ("data1", "data2", etc) and a formula using the prefixes, allowing for bquote
-style quoting for constant names:
suppressPackageStartupMessages(library(dplyr))
mutateSet <- function(df, colNames, formula,
isPrefix = TRUE,
separator = "_") {
vars <- all.vars(formula)
# extracts names wrapped in `.()`
escapedNames <- function (expr)
{
unquote <- function(e) {
if (is.pairlist(e) || length(e) <= 1L) NULL
else if (e[[1L]] == as.name(".")) deparse(e[[2L]])
else unlist(sapply(e, unquote))
}
unquote(substitute(expr))
}
escapedVars <- eval(rlang::expr(escapedNames(!!formula)))
# remove escaped names from mapping variables
vars <- setdiff(vars, escapedVars)
# get output prefix/suffix as string
lhs <- rlang::f_lhs(formula) %>%
all.vars()
# get operation as string
# deparse() can have line breaks; paste0() brings it back to one line
rhs <- rlang::f_rhs(formula) %>%
deparse() %>%
paste0(collapse = "")
# dummy function to cover for bquote escaping
. <- function(x) x
for (i in colNames) {
if (isPrefix) {
aliases <- paste0(vars, separator, i)
newCol <- paste0(lhs, separator, i)
} else {
aliases <- paste0(i, separator, vars)
newCol <- paste0(i, separator, lhs)
}
if (length(lhs) == 0) newCol <- i
mapping <- rlang::list2(!!!aliases)
names(mapping) <- vars
mapping <- do.call(wrapr::qc, mapping)
df <- rlang::expr(wrapr::let(
mapping,
df %>% dplyr::mutate(!!newCol := ...RHS...)
)) %>%
deparse() %>%
gsub(
pattern = "...RHS...",
replacement = rhs
) %>%
{eval(parse(text = .))}
}
return(df)
}
df <- data.frame(a_data1 = 1:3, b_data1 = 2:4,
a_data2 = 3:5, b_data2 = 4:6,
static = 5:7)
mutateSet(df, "data1", ~ a + b)
#> a_data1 b_data1 a_data2 b_data2 static data1
#> 1 1 2 3 4 5 3
#> 2 2 3 4 5 6 5
#> 3 3 4 5 6 7 7
mutateSet(df, c("data1", "data2"), x ~ sqrt(a) + b)
#> a_data1 b_data1 a_data2 b_data2 static x_data1 x_data2
#> 1 1 2 3 4 5 3.000000 5.732051
#> 2 2 3 4 5 6 4.414214 7.000000
#> 3 3 4 5 6 7 5.732051 8.236068
mutateSet(df, c("data1", "data2"), ~ a + b + .(static))
#> a_data1 b_data1 a_data2 b_data2 static data1 data2
#> 1 1 2 3 4 5 8 12
#> 2 2 3 4 5 6 11 15
#> 3 3 4 5 6 7 14 18
Created on 2020-04-28 by the reprex package (v0.3.0)
This can probably be cleaned up (especially that heinous for-loop), but it works for now.
Repeating @user12728748's performance test, we see this is ~100x faster:
suppressPackageStartupMessages({
invisible(lapply(c("dplyr", "tidyr", "rlang", "wrapr", "microbenchmark"),
require, character.only = TRUE))
})
polymutate <- function(df, formula) {
form <- rlang::f_rhs(formula)
df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_to = c('.value', 'grp'),
names_sep = "_") %>%
group_by(grp) %>%
transmute(rn, new = eval(form)) %>%
pivot_wider(names_from = grp, values_from = new) %>%
select(-rn) %>%
bind_cols(df, .)
}
set.seed(1)
df <- setNames(data.frame(matrix(sample(1:12, 6E6, replace=TRUE), ncol=6)),
c("a_data1", "b_data1", "a_data2", "b_data2", "a_data3", "b_data3"))
pd <- polymutate(df, ~ a + b)
pd2 <- mutateSet(df, c("data1", "data2", "data3"), ~ a + b)
all.equal(pd, pd2)
#> [1] TRUE
microbenchmark(polymutate(df, ~ a + b),
mutateSet(df, c("data1", "data2", "data3"), ~ a + b),
times=10L)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> polymutate 1612.306 1628.9776 1690.78586 1670.15600 1741.3490 1806.1412 10
#> mutateSet 8.757 9.6302 13.27135 10.45965 19.2976 20.4657 10
Related Topics
Table of Interactions - Case With Pets and Houses
How to Count Runs in a Sequence
How to Remove Outliers from a Dataset
All Levels of a Factor in a Model Matrix in R
How to Install Packages in Latest Version of Rstudio and R Version.3.1.1
Check If the Number Is Integer
Putting Mathematical Symbols and Subscripts Mixed With Regular Letters
Windows 7, Update.Packages Problem: "Unable to Move Temporary Installation"
Ggplot Bar Plot With Facet-Dependent Order of Categories
Formatting Dates on X Axis in Ggplot2
Dummify Character Column and Find Unique Values
R Conditional Evaluation When Using the Pipe Operator %≫%
R Install.Packages Returns "Failed to Create Lock Directory"
Extract Regression Coefficient Values