Using Anonymous Functions with Summarize_Each or Mutate_Each

Using anonymous functions with summarize_each or mutate_each

It's a matter of using a lot of parentheses so everything gets evaluated:

df_foo %>% 
summarize_each(funs(((function(bar){sum(bar/10)})(.))))
#
# Source: local data frame [1 x 2]
#
# x y
# (dbl) (dbl)
# 1 1.113599 -0.4766853

where you need

  • parentheses around the function definition so it gets defined,
  • a set of parentheses with a . to tell funs which parameter to stick the data passed to it in (seemingly redundant with single-parameter functions, but not so with multi-parameter ones; see ?funs for more examples), and
  • parentheses around the whole thing to actually evaluate it,

which is kind of ridiculous, but that seems to be the most concise funs can handle. It makes some sense if you look at what you'd have to write to evaluate a similar anonymous function on its own line, e.g.

(function(bar){sum(bar/10)})(df_foo$x)

though the pair wrapping the whole thing are extra for funs. You can use braces {} instead for the outer pair if you prefer, which might make more syntactic sense.

How to use anonymous functions for mutate_each (and summarise_each)?

We can wrap the function call with parentheses

df %>%
mutate_each(funs(((function(x){x/2})(.))))

Using an anonymous function in mutate

Apparently what you need is a whole bunch of parentheses. See https://stackoverflow.com/a/36906989/3277050

In your situation it looks like:

files.split.df <- files.paths.df %>% 
mutate(
no.ext = (function(x) {sub(paste0(".", x["extension"], "$"), "", x["file"])})(.)
)

So it seems like if you wrap the whole function definition in brackets you can then treat it like a regular function and supply arguments to it.

New Answer

Really this is not the right way to use mutate at all though. I got focused in on the anonymous function part first without looking at what you are actually doing. What you need is a vectorized version of sub. So I used str_replace from the stringr package. Then you can just refer to columns by name because that is the beauty of dplyr:

library(tidyr)
library(dplyr)
library(stringr)

files.split.df <- files.paths.df %>%
mutate(
no.ext = str_replace(file, paste0(".", extension, "$"), ""))

Edit to Answer Comment

To use a user defined function where there isn't an existing vectorized function you could use Vectorize like this:

string_fun <- Vectorize(function(x, y) {sub(paste0(".", x, "$"), "", y)})
files.split.df <- files.paths.df %>%
mutate(
no.ext = string_fun(extension, file))

Or if you really don't want to name the function, which I do not recommend as it is much harder to read:

files.split.df <- files.paths.df %>% 
mutate(
no.ext = (Vectorize(function(x, y) {sub(paste0(".", x, "$"), "", y)}))(extension, file))

Update a subset of a df with mutate_each

As commented @alistaire, you can use mutate_at to convert only those date columns and keep the rest of the data frames unchanged, so that you can avoid binding the original data frame with the subsets:

library(dplyr)
muX <- x %>% mutate_at(vars(contains('date')), funs(as.Date(., origin="1900-01-01")))

head(muX)
# date1 date2 var1 var2
# 1 2021-11-09 2038-10-20 44.524710 86.15957
# 2 2020-06-04 2037-08-04 31.402905 94.74633
# 3 2023-12-22 2038-03-06 31.600929 85.90605
# 4 2020-05-08 2037-01-02 7.140777 82.80565
# 5 2025-03-25 2038-07-30 -54.913577 100.83949
# 6 2021-02-18 2034-06-20 28.616670 93.92246

And also according to ?mutate_at:

summarise_each() and mutate_each() are older variants that will be
deprecated in the future.

Better get used to these new APIs.

How to change the now deprecated dplyr::funs() which includes an ifelse argument?

As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:

Before:

funs(name = f(.))

After:

list(name = ~f(.))

So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.

Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().

You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.



Related Topics



Leave a reply



Submit