How to Pass Multiple Group_By Arguments and a Dynamic Variable Argument to a Dplyr Function

How to pass multiple group_by arguments and a dynamic variable argument to a dplyr function

You haven't provided sample data, but your function works when modified to use the mtcars data frame.

library(tidyverse)
library(formattable)

quantileMaker3 <- function(data, calcCol, ...) {
groupCol <- quos(...)
calcCol <- enquo(calcCol)

data %>%
group_by(!!!groupCol) %>%
summarise('25%' = currency(quantile(!!calcCol, probs = 0.25), digits = 2L),
'50%' = currency(quantile(!!calcCol, probs = 0.50), digits = 2L),
'75%' = currency(quantile(!!calcCol, probs = 0.75), digits = 2L),
avg = currency(mean(!!calcCol), digits = 2L),
nAgencies = n_distinct(cyl),
nFTEs = sum(hp)
)
}

quantileMaker3(mtcars, mpg, cyl)
# A tibble: 3 x 7
cyl `25%` `50%` `75%` avg nAgencies nFTEs
<dbl> <S3: formattable> <S3: formattable> <S3: formattable> <S3: formattable> <int> <dbl>
1 4. $22.80 $26.00 $30.40 $26.66 1 909.
2 6. $18.65 $19.70 $21.00 $19.74 1 856.
3 8. $14.40 $15.20 $16.25 $15.10 1 2929.

With multiple grouping arguments:

quantileMaker3(mtcars, mpg, cyl, vs)
# A tibble: 5 x 8
# Groups: cyl [?]
cyl vs `25%` `50%` `75%` avg nAgencies nFTEs
<dbl> <dbl> <S3: formattable> <S3: formattable> <S3: formattable> <S3: formattable> <int> <dbl>
1 4. 0. $26.00 $26.00 $26.00 $26.00 1 91.
2 4. 1. $22.80 $25.85 $30.40 $26.73 1 818.
3 6. 0. $20.35 $21.00 $21.00 $20.57 1 395.
4 6. 1. $18.03 $18.65 $19.75 $19.12 1 461.
5 8. 0. $14.40 $15.20 $16.25 $15.10 1 2929.

Incidentally, you can avoid multiple calls to quantile by using nesting. This won't work if any of the output columns are of class formattable (which is what the currency function returns), so I've changed the function to create strings for the currency-format columns.

quantileMaker3 <- function(data, calcCol, ..., quantiles=c(0.25,0.5,0.75)) {

groupCol <- quos(...)
calcCol <- enquo(calcCol)

data %>%
group_by(!!!groupCol) %>%
summarise(values = list(paste0("$", sprintf("%1.2f", quantile(!!calcCol, probs=quantiles)))),
qnames = list(sprintf("%1.0f%%", quantiles*100)),
nAgencies = n_distinct(cyl),
nFTEs = sum(hp),
avg = paste0("$", sprintf("%1.2f", mean(!!calcCol)))
) %>%
unnest %>%
spread(qnames, values)
}

quantileMaker3(mtcars, mpg, cyl, vs)
# A tibble: 5 x 8
# Groups: cyl [3]
cyl vs nAgencies nFTEs avg `25%` `50%` `75%`
<dbl> <dbl> <int> <dbl> <chr> <chr> <chr> <chr>
1 4. 0. 1 91. $26.00 $26.00 $26.00 $26.00
2 4. 1. 1 818. $26.73 $22.80 $25.85 $30.40
3 6. 0. 1 395. $20.57 $20.35 $21.00 $21.00
4 6. 1. 1 461. $19.12 $18.03 $18.65 $19.75
5 8. 0. 1 2929. $15.10 $14.40 $15.20 $16.25

dplyr: Handing over multiple variables to group_by in a function

If we pass multiple variables, pass that as a string and make use of group_by_at

myfunction <- function(mydf, grp, xvar) {  
mydf %>%
group_by_at(grp) %>%
summarise(sum = sum({{xvar}}))
}

myfunction(mtcars, "am", mpg)
# A tibble: 2 x 2
# am sum
# <dbl> <dbl>
#1 0 326.
#2 1 317.
myfunction(mtcars, c("am", "gear"), mpg)
# A tibble: 4 x 3
# Groups: am [2]
# am gear sum
# <dbl> <dbl> <dbl>
#1 0 3 242.
#2 0 4 84.2
#3 1 4 210.
#4 1 5 107.

In case, we want to pass the groups as showed in the OP's post, one way is to convert with enexpr and evaluate (!!!)

myfunction <- function(mydf, grp, xvar) {  
grp <- as.list(rlang::enexpr(grp))
grp <- if(length(grp) > 1) grp[-1] else grp

mydf %>%
group_by(!!! grp) %>%
summarise(sum = sum({{xvar}}))

}

myfunction(mtcars, am, mpg)
# A tibble: 2 x 2
# am sum
# <dbl> <dbl>
#1 0 326.
#2 1 317.
myfunction(mtcars, c(am, gear), mpg)
# A tibble: 4 x 3
# Groups: am [2]
# am gear sum
# <dbl> <dbl> <dbl>
#1 0 3 242.
#2 0 4 84.2
#3 1 4 210.
#4 1 5 107.

Non standard evaluation in dplyr: how do you indirect a function's multiple arguments?

You can define a arg for the data.frame and add the ... for others variables to group by

testfunc <- function(df,...) {
df %>%
group_by(...) %>%
summarise(mpg = mean(mpg))
}
testfunc(mtcars,cyl,gear)

Dynamically construct function calls with varying arguments using dplyr and NSE

I'm not sure what the "standard" tidyverse approach is here, as I never really have a sense of whether I'm "doing it right" when I try to write generalized tidyverse functions for my typical workflows, but here's another approach.*

First, we can generate a list of combinations of grouping columns, rather than hard-coding them. In this case, the list includes all possible combinations of 1, 2, or 3 grouping columns, but that can be pared back as needed.

library(tidyverse)

# Generate a list of combinations of grouping variables.
groups.list = map(1:3, ~combn(names(df)[map_lgl(df, ~!is.numeric(.))], .x, simplify=FALSE)) %>%
flatten

Below is a summary function that uses group_by_at, which can take strings, so there's no need for non-standard evaluation. In addition, we get the group.ids values from group_vars itself, so we don't need a separate parameter or argument (though this may need to be tweaked, depending on what you expect for the names of the grouping columns).

# Summarise for each combination of groups
# Generate group.ids from group_vars itself
f2 <- function(data, group_vars) {

data %>%
group_by_at(group_vars) %>%
summarise(values=sum(values)) %>%
mutate(group.ids=paste0("var_", paste(str_extract(group_vars, "[0-9]"), collapse="_")))

}

Now we can run the run the function on every element of group.list

map(groups.list, ~f2(df, .x))
[[1]]
# A tibble: 2 x 3
grouping_var1 values group.ids
<fct> <int> <chr>
1 a 31 var_1
2 b 24 var_1

[[2]]
# A tibble: 3 x 3
grouping_var2 values group.ids
<fct> <int> <chr>
1 x 40 var_2
2 y 11 var_2
3 z 4 var_2

[[3]]
# A tibble: 2 x 3
grouping_var3 values group.ids
<fct> <int> <chr>
1 A 24 var_3
2 B 31 var_3

[[4]]
# A tibble: 5 x 4
# Groups: grouping_var1 [2]
grouping_var1 grouping_var2 values group.ids
<fct> <fct> <int> <chr>
1 a x 19 var_1_2
2 a y 8 var_1_2
3 a z 4 var_1_2
4 b x 21 var_1_2
5 b y 3 var_1_2

[[5]]
# A tibble: 4 x 4
# Groups: grouping_var1 [2]
grouping_var1 grouping_var3 values group.ids
<fct> <fct> <int> <chr>
1 a A 9 var_1_3
2 a B 22 var_1_3
3 b A 15 var_1_3
4 b B 9 var_1_3

[[6]]
# A tibble: 4 x 4
# Groups: grouping_var2 [3]
grouping_var2 grouping_var3 values group.ids
<fct> <fct> <int> <chr>
1 x A 24 var_2_3
2 x B 16 var_2_3
3 y B 11 var_2_3
4 z B 4 var_2_3

[[7]]
# A tibble: 7 x 5
# Groups: grouping_var1, grouping_var2 [5]
grouping_var1 grouping_var2 grouping_var3 values group.ids
<fct> <fct> <fct> <int> <chr>
1 a x A 9 var_1_2_3
2 a x B 10 var_1_2_3
3 a y B 8 var_1_2_3
4 a z B 4 var_1_2_3
5 b x A 15 var_1_2_3
6 b x B 6 var_1_2_3
7 b y B 3 var_1_2_3

Or, if you want to combine all of the results, you could do something like this:

map(groups.list, ~f2(df, .x)) %>% 
bind_rows() %>%
mutate_if(is.factor, fct_explicit_na, na_level="All") %>%
select(group.ids, matches("grouping"), values)
   group.ids grouping_var1 grouping_var2 grouping_var3 values
<chr> <fct> <fct> <fct> <int>
1 var_1 a All All 31
2 var_1 b All All 24
3 var_2 All x All 40
4 var_2 All y All 11
5 var_2 All z All 4
6 var_3 All All A 24
7 var_3 All All B 31
8 var_1_2 a x All 19
9 var_1_2 a y All 8
10 var_1_2 a z All 4
11 var_1_2 b x All 21
12 var_1_2 b y All 3
13 var_1_3 a All A 9
14 var_1_3 a All B 22
15 var_1_3 b All A 15
16 var_1_3 b All B 9
17 var_2_3 All x A 24
18 var_2_3 All x B 16
19 var_2_3 All y B 11
20 var_2_3 All z B 4
21 var_1_2_3 a x A 9
22 var_1_2_3 a x B 10
23 var_1_2_3 a y B 8
24 var_1_2_3 a z B 4
25 var_1_2_3 b x A 15
26 var_1_2_3 b x B 6
27 var_1_2_3 b y B 3
  • This question was cross-posted to RStudio Community and I've added this answer there as well.

How to use dplyr::group_by in a function

You can use group_by_at and column index such as:

countString <- function(things) {
index <- which(colnames(theTibble) %in% things)
theTibble %>%
group_by_at(index) %>%
count()
}

countString(c("animal", "sex"))

## A tibble: 4 x 3
## Groups: animal, sex [4]
# animal sex nn
# <chr> <chr> <int>
#1 cat f 2
#2 dog f 1
#3 dog m 2
#4 fish unknown 1

How to pass column name as argument to function for dplyr verbs?

Here is another way of making it work. You can use .data[[var]] construct for a column name which is stored as a string:

foo <- function(data, colName) {

result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())

return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows

In case you decide not to pass the ColName as a string you you wrap it with a pair of curly braces inside your function to get the similar result:

foo <- function(data, colName) {

result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())

return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows

dplyr - groupby on multiple columns using variable names

dplyr version >1.0

With more recent versions of dplyr, you should use across along with a tidyselect helper function. See help("language", "tidyselect") for a list of all the helper functions. In this case if you want all columns in a character vector, use all_of()

cols <- c("mpg","hp","wt")
mtcars %>%
group_by(across(all_of(cols))) %>%
summarize(x=mean(gear))

original answer (older versions of dplyr)

If you have a vector of variable names, you should pass them to the .dots= parameter of group_by_. For example:

mtcars %>% 
group_by_(.dots=c("mpg","hp","wt")) %>%
summarize(x=mean(gear))

Passing a list of arguments to a function with quasiquotation

You can rewrite the function using a combination of dplyr::group_by(), dplyr::across(), and curly curly embracing {{. This works with dplyr version 1.0.0 and greater.

I've edited the original example and code for clarity.

library(tidyverse)

my_data <- tribble(
~foo, ~bar, ~baz,
"A", "B", 3,
"A", "C", 5,
"D", "E", 6,
"D", "E", 1
)

sum_fun <- function(.data, group, sum_var) {
.data %>%
group_by(across({{ group }})) %>%
summarize("sum_{{sum_var}}" := sum({{ sum_var }}))
}

sum_fun(my_data, group = c(foo, bar), sum_var = baz)
#> `summarise()` has grouped output by 'foo'. You can override using the `.groups` argument.
#> # A tibble: 3 x 3
#> # Groups: foo [2]
#> foo bar sum_baz
#> <chr> <chr> <dbl>
#> 1 A B 3
#> 2 A C 5
#> 3 D E 7

Created on 2021-09-06 by the reprex package (v2.0.0)

Cannot pass additional arguments to group_map

I am not quite sure what you need, but you can get group_map working like this:

library(dplyr)
data(mtcars)
myFunction2 <- function(data, sumFirst) {
sumFirst
}
mtcars %>% group_by(carb) %>% group_map(~myFunction2(.x,2))

Ok based my limited knowledge on the underworkings of group_map, in the vignette it says:

If a function, it is used as is. It should have at least 2 formal
arguments.

If I read the source code correctly, so the first argument is the data, and the second is the keys, so a backbone to get it working is, using a different function (so we can see it really works):

group_map(by_carb,.f=function(data,keys)colMeans(data))

Now if you want to pass a custom function, then it will be:

group_map(by_carb,.f=function(data,keys,func)func(data),func=colMeans)

And you can check the results which i will not print here. It's the same as if we do, which is easier to write (i think):

group_map(by_carb,~colSums(.x))


Related Topics



Leave a reply



Submit