How to pass multiple group_by arguments and a dynamic variable argument to a dplyr function
You haven't provided sample data, but your function works when modified to use the mtcars
data frame.
library(tidyverse)
library(formattable)
quantileMaker3 <- function(data, calcCol, ...) {
groupCol <- quos(...)
calcCol <- enquo(calcCol)
data %>%
group_by(!!!groupCol) %>%
summarise('25%' = currency(quantile(!!calcCol, probs = 0.25), digits = 2L),
'50%' = currency(quantile(!!calcCol, probs = 0.50), digits = 2L),
'75%' = currency(quantile(!!calcCol, probs = 0.75), digits = 2L),
avg = currency(mean(!!calcCol), digits = 2L),
nAgencies = n_distinct(cyl),
nFTEs = sum(hp)
)
}
quantileMaker3(mtcars, mpg, cyl)
# A tibble: 3 x 7
cyl `25%` `50%` `75%` avg nAgencies nFTEs
<dbl> <S3: formattable> <S3: formattable> <S3: formattable> <S3: formattable> <int> <dbl>
1 4. $22.80 $26.00 $30.40 $26.66 1 909.
2 6. $18.65 $19.70 $21.00 $19.74 1 856.
3 8. $14.40 $15.20 $16.25 $15.10 1 2929.
With multiple grouping arguments:
quantileMaker3(mtcars, mpg, cyl, vs)
# A tibble: 5 x 8
# Groups: cyl [?]
cyl vs `25%` `50%` `75%` avg nAgencies nFTEs
<dbl> <dbl> <S3: formattable> <S3: formattable> <S3: formattable> <S3: formattable> <int> <dbl>
1 4. 0. $26.00 $26.00 $26.00 $26.00 1 91.
2 4. 1. $22.80 $25.85 $30.40 $26.73 1 818.
3 6. 0. $20.35 $21.00 $21.00 $20.57 1 395.
4 6. 1. $18.03 $18.65 $19.75 $19.12 1 461.
5 8. 0. $14.40 $15.20 $16.25 $15.10 1 2929.
Incidentally, you can avoid multiple calls to quantile by using nesting. This won't work if any of the output columns are of class formattable
(which is what the currency
function returns), so I've changed the function to create strings for the currency-format columns.
quantileMaker3 <- function(data, calcCol, ..., quantiles=c(0.25,0.5,0.75)) {
groupCol <- quos(...)
calcCol <- enquo(calcCol)
data %>%
group_by(!!!groupCol) %>%
summarise(values = list(paste0("$", sprintf("%1.2f", quantile(!!calcCol, probs=quantiles)))),
qnames = list(sprintf("%1.0f%%", quantiles*100)),
nAgencies = n_distinct(cyl),
nFTEs = sum(hp),
avg = paste0("$", sprintf("%1.2f", mean(!!calcCol)))
) %>%
unnest %>%
spread(qnames, values)
}
quantileMaker3(mtcars, mpg, cyl, vs)
# A tibble: 5 x 8
# Groups: cyl [3]
cyl vs nAgencies nFTEs avg `25%` `50%` `75%`
<dbl> <dbl> <int> <dbl> <chr> <chr> <chr> <chr>
1 4. 0. 1 91. $26.00 $26.00 $26.00 $26.00
2 4. 1. 1 818. $26.73 $22.80 $25.85 $30.40
3 6. 0. 1 395. $20.57 $20.35 $21.00 $21.00
4 6. 1. 1 461. $19.12 $18.03 $18.65 $19.75
5 8. 0. 1 2929. $15.10 $14.40 $15.20 $16.25
dplyr: Handing over multiple variables to group_by in a function
If we pass multiple variables, pass that as a string and make use of group_by_at
myfunction <- function(mydf, grp, xvar) {
mydf %>%
group_by_at(grp) %>%
summarise(sum = sum({{xvar}}))
}
myfunction(mtcars, "am", mpg)
# A tibble: 2 x 2
# am sum
# <dbl> <dbl>
#1 0 326.
#2 1 317.
myfunction(mtcars, c("am", "gear"), mpg)
# A tibble: 4 x 3
# Groups: am [2]
# am gear sum
# <dbl> <dbl> <dbl>
#1 0 3 242.
#2 0 4 84.2
#3 1 4 210.
#4 1 5 107.
In case, we want to pass the groups as showed in the OP's post, one way is to convert with enexpr
and evaluate (!!!
)
myfunction <- function(mydf, grp, xvar) {
grp <- as.list(rlang::enexpr(grp))
grp <- if(length(grp) > 1) grp[-1] else grp
mydf %>%
group_by(!!! grp) %>%
summarise(sum = sum({{xvar}}))
}
myfunction(mtcars, am, mpg)
# A tibble: 2 x 2
# am sum
# <dbl> <dbl>
#1 0 326.
#2 1 317.
myfunction(mtcars, c(am, gear), mpg)
# A tibble: 4 x 3
# Groups: am [2]
# am gear sum
# <dbl> <dbl> <dbl>
#1 0 3 242.
#2 0 4 84.2
#3 1 4 210.
#4 1 5 107.
Non standard evaluation in dplyr: how do you indirect a function's multiple arguments?
You can define a arg for the data.frame and add the ...
for others variables to group by
testfunc <- function(df,...) {
df %>%
group_by(...) %>%
summarise(mpg = mean(mpg))
}
testfunc(mtcars,cyl,gear)
Dynamically construct function calls with varying arguments using dplyr and NSE
I'm not sure what the "standard" tidyverse approach is here, as I never really have a sense of whether I'm "doing it right" when I try to write generalized tidyverse functions for my typical workflows, but here's another approach.*
First, we can generate a list of combinations of grouping columns, rather than hard-coding them. In this case, the list includes all possible combinations of 1, 2, or 3 grouping columns, but that can be pared back as needed.
library(tidyverse)
# Generate a list of combinations of grouping variables.
groups.list = map(1:3, ~combn(names(df)[map_lgl(df, ~!is.numeric(.))], .x, simplify=FALSE)) %>%
flatten
Below is a summary function that uses group_by_at
, which can take strings, so there's no need for non-standard evaluation. In addition, we get the group.ids
values from group_vars
itself, so we don't need a separate parameter or argument (though this may need to be tweaked, depending on what you expect for the names of the grouping columns).
# Summarise for each combination of groups
# Generate group.ids from group_vars itself
f2 <- function(data, group_vars) {
data %>%
group_by_at(group_vars) %>%
summarise(values=sum(values)) %>%
mutate(group.ids=paste0("var_", paste(str_extract(group_vars, "[0-9]"), collapse="_")))
}
Now we can run the run the function on every element of group.list
map(groups.list, ~f2(df, .x))
[[1]]
# A tibble: 2 x 3
grouping_var1 values group.ids
<fct> <int> <chr>
1 a 31 var_1
2 b 24 var_1
[[2]]
# A tibble: 3 x 3
grouping_var2 values group.ids
<fct> <int> <chr>
1 x 40 var_2
2 y 11 var_2
3 z 4 var_2
[[3]]
# A tibble: 2 x 3
grouping_var3 values group.ids
<fct> <int> <chr>
1 A 24 var_3
2 B 31 var_3
[[4]]
# A tibble: 5 x 4
# Groups: grouping_var1 [2]
grouping_var1 grouping_var2 values group.ids
<fct> <fct> <int> <chr>
1 a x 19 var_1_2
2 a y 8 var_1_2
3 a z 4 var_1_2
4 b x 21 var_1_2
5 b y 3 var_1_2
[[5]]
# A tibble: 4 x 4
# Groups: grouping_var1 [2]
grouping_var1 grouping_var3 values group.ids
<fct> <fct> <int> <chr>
1 a A 9 var_1_3
2 a B 22 var_1_3
3 b A 15 var_1_3
4 b B 9 var_1_3
[[6]]
# A tibble: 4 x 4
# Groups: grouping_var2 [3]
grouping_var2 grouping_var3 values group.ids
<fct> <fct> <int> <chr>
1 x A 24 var_2_3
2 x B 16 var_2_3
3 y B 11 var_2_3
4 z B 4 var_2_3
[[7]]
# A tibble: 7 x 5
# Groups: grouping_var1, grouping_var2 [5]
grouping_var1 grouping_var2 grouping_var3 values group.ids
<fct> <fct> <fct> <int> <chr>
1 a x A 9 var_1_2_3
2 a x B 10 var_1_2_3
3 a y B 8 var_1_2_3
4 a z B 4 var_1_2_3
5 b x A 15 var_1_2_3
6 b x B 6 var_1_2_3
7 b y B 3 var_1_2_3
Or, if you want to combine all of the results, you could do something like this:
map(groups.list, ~f2(df, .x)) %>%
bind_rows() %>%
mutate_if(is.factor, fct_explicit_na, na_level="All") %>%
select(group.ids, matches("grouping"), values)
group.ids grouping_var1 grouping_var2 grouping_var3 values
<chr> <fct> <fct> <fct> <int>
1 var_1 a All All 31
2 var_1 b All All 24
3 var_2 All x All 40
4 var_2 All y All 11
5 var_2 All z All 4
6 var_3 All All A 24
7 var_3 All All B 31
8 var_1_2 a x All 19
9 var_1_2 a y All 8
10 var_1_2 a z All 4
11 var_1_2 b x All 21
12 var_1_2 b y All 3
13 var_1_3 a All A 9
14 var_1_3 a All B 22
15 var_1_3 b All A 15
16 var_1_3 b All B 9
17 var_2_3 All x A 24
18 var_2_3 All x B 16
19 var_2_3 All y B 11
20 var_2_3 All z B 4
21 var_1_2_3 a x A 9
22 var_1_2_3 a x B 10
23 var_1_2_3 a y B 8
24 var_1_2_3 a z B 4
25 var_1_2_3 b x A 15
26 var_1_2_3 b x B 6
27 var_1_2_3 b y B 3
- This question was cross-posted to RStudio Community and I've added this answer there as well.
How to use dplyr::group_by in a function
You can use group_by_at
and column index such as:
countString <- function(things) {
index <- which(colnames(theTibble) %in% things)
theTibble %>%
group_by_at(index) %>%
count()
}
countString(c("animal", "sex"))
## A tibble: 4 x 3
## Groups: animal, sex [4]
# animal sex nn
# <chr> <chr> <int>
#1 cat f 2
#2 dog f 1
#3 dog m 2
#4 fish unknown 1
How to pass column name as argument to function for dplyr verbs?
Here is another way of making it work. You can use .data[[var]]
construct for a column name which is stored as a string:
foo <- function(data, colName) {
result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())
return(result)
}
foo(quakes, "stations")
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
In case you decide not to pass the ColName
as a string you you wrap it with a pair of curly braces inside your function to get the similar result:
foo <- function(data, colName) {
result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())
return(result)
}
foo(quakes, stations)
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
dplyr - groupby on multiple columns using variable names
dplyr version >1.0
With more recent versions of dplyr
, you should use across
along with a tidyselect helper function. See help("language", "tidyselect")
for a list of all the helper functions. In this case if you want all columns in a character vector, use all_of()
cols <- c("mpg","hp","wt")
mtcars %>%
group_by(across(all_of(cols))) %>%
summarize(x=mean(gear))
original answer (older versions of dplyr)
If you have a vector of variable names, you should pass them to the .dots=
parameter of group_by_
. For example:
mtcars %>%
group_by_(.dots=c("mpg","hp","wt")) %>%
summarize(x=mean(gear))
Passing a list of arguments to a function with quasiquotation
You can rewrite the function using a combination of dplyr::group_by()
, dplyr::across()
, and curly curly embracing {{
. This works with dplyr version 1.0.0 and greater.
I've edited the original example and code for clarity.
library(tidyverse)
my_data <- tribble(
~foo, ~bar, ~baz,
"A", "B", 3,
"A", "C", 5,
"D", "E", 6,
"D", "E", 1
)
sum_fun <- function(.data, group, sum_var) {
.data %>%
group_by(across({{ group }})) %>%
summarize("sum_{{sum_var}}" := sum({{ sum_var }}))
}
sum_fun(my_data, group = c(foo, bar), sum_var = baz)
#> `summarise()` has grouped output by 'foo'. You can override using the `.groups` argument.
#> # A tibble: 3 x 3
#> # Groups: foo [2]
#> foo bar sum_baz
#> <chr> <chr> <dbl>
#> 1 A B 3
#> 2 A C 5
#> 3 D E 7
Created on 2021-09-06 by the reprex package (v2.0.0)
Cannot pass additional arguments to group_map
I am not quite sure what you need, but you can get group_map working like this:
library(dplyr)
data(mtcars)
myFunction2 <- function(data, sumFirst) {
sumFirst
}
mtcars %>% group_by(carb) %>% group_map(~myFunction2(.x,2))
Ok based my limited knowledge on the underworkings of group_map, in the vignette it says:
If a function, it is used as is. It should have at least 2 formal
arguments.
If I read the source code correctly, so the first argument is the data, and the second is the keys, so a backbone to get it working is, using a different function (so we can see it really works):
group_map(by_carb,.f=function(data,keys)colMeans(data))
Now if you want to pass a custom function, then it will be:
group_map(by_carb,.f=function(data,keys,func)func(data),func=colMeans)
And you can check the results which i will not print here. It's the same as if we do, which is easier to write (i think):
group_map(by_carb,~colSums(.x))
Related Topics
Plot a Function with Several Arguments in R
R - Identify Consecutive Sequences
Changing Line Color in Ggplot Based on Slope
R - Converting Posixct to Milliseconds
Convert Month's Number to Month Name
R: Removing Duplicate Elements in a Vector
Unexpected Date When Converting Posixct Date-Time to Date - Timezone Issue
Lapply with Anonymous Function Call to Svytable Results in Object 'X' Not Found
Total Mean & Mean by Groups in R with Dplyr
Extract English Words from a Text in R
Character String Is Not in a Standard Unambiguous Format
Getting the Minimum of the Rows in a Data Frame
Get First Entries in Rows of List
Standard Eval with Ggplot2 Without 'Aes_String()'
Set Standard Legend Key Size with Long Label Names Ggplot
Drawing Minor Ticks (Not Grid Ticks) in Ggplot2 in a Date Format Axis