Pass arguments to dplyr functions
You need to use the standard evaluation versions of the dplyr
functions (just append '_' to the function names, ie. group_by_
& summarise_
) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp()
, which is defined in the lazyeval
package. Concretely:
library(dplyr)
library(lazyeval)
not.uniq.per.group <- function(df, grp.var, uniq.var) {
df %>%
group_by_(grp.var) %>%
summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
filter(n_uniq > 1)
}
not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")
Note that in recent versions of dplyr
the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.
See the Programming with dplyr
vignette for more information on working with non-standard evaluation.
Pass variable from dataset into a function that calls dplyr
You just need to use the operator {{}}
, here a reference for more details.
test<-function(var){
iris %>% group_by(Species) %>% summarise(mean({{var}}, na.rm=TRUE))
}
test(Sepal.Width)
# A tibble: 3 x 2
Species `mean(Sepal.Width, na.rm = TRUE)`
<fct> <dbl>
1 setosa 3.43
2 versicolor 2.77
3 virginica 2.97
How to pass a column argument in a dplyr function in select?
We can use enquo
to convert it to a quosure and then evaluate with !!
slicedata <- function(df, column_name){
column_name = enquo(column_name)
df %>%
select(!!column_name, C, D, E) %>%
group_by(!!column_name) %>%
summarise(C = sum(C), D = sum(D), E = sum(E)
}
slicedata(df, B)
How to pass column name as argument to function for dplyr verbs?
Here is another way of making it work. You can use .data[[var]]
construct for a column name which is stored as a string:
foo <- function(data, colName) {
result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())
return(result)
}
foo(quakes, "stations")
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
In case you decide not to pass the ColName
as a string you you wrap it with a pair of curly braces inside your function to get the similar result:
foo <- function(data, colName) {
result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())
return(result)
}
foo(quakes, stations)
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
Passing arguments to dplyr summarize function
You need to use Non-Standard Evaluation (NSE) to use dplyr
functions programmatically alongside lazyeval
. The dplyr
NSE vignette covers it fairly well.
library(dplyr)
library(lazyeval)
data <- group_by(iris, Species)
SummaryStatistics <- function(table, field){
table %>%
summarise_(count = ~n(),
min = interp(~min(var, na.rm = T), var = as.name(field)),
mean = interp(~mean(var, na.rm = T, trim=0.05), var = as.name(field)),
median = interp(~median(var, na.rm = T), var = as.name(field)))
}
SummaryStatistics(data, "Sepal.Length")
# A tibble: 3 × 5
Species count min mean median
<fctr> <int> <dbl> <dbl> <dbl>
1 setosa 50 4.3 5.002174 5.0
2 versicolor 50 4.9 5.934783 5.9
3 virginica 50 4.9 6.593478 6.5
passing arguments for summaries dplyr package in R
Are you after something like this?
library(tidyverse)
summary_fn <- function(data, ..., select_var, fun) {
group <- enquos(...)
var <- enquo(select_var)
funs <- map(setNames(fun, fun), ~.x)
data %>%
group_by(!!!group) %>%
summarise(across(!!var, funs), .groups = "drop")
}
summary_fn(mtcars, cyl, am, select_var = mpg, fun = c("mean", "max"))
## A tibble: 6 x 4
# cyl am mpg_mean mpg_max
# <dbl> <dbl> <dbl> <dbl>
#1 4 0 22.9 24.4
#2 4 1 28.1 33.9
#3 6 0 19.1 21.4
#4 6 1 20.6 21
#5 8 0 15.0 19.2
#6 8 1 15.4 15.8
If you provide fun
as a named list
you can skip the funs <- map(...)
step.
PS. Replacing enquo
with ensym
and enquos
with ensyms
also works.
Passing column name as argument in function within pipes
You need to make use of non standard evaluation which is worth a quick read about. In this case you most likely need to !!
infront of var
in the mutate line.
Here's the line:
mutate(new_variable = !!sym(var) * 100)
Related Topics
Getting Warning: " 'Newdata' Had 1 Row But Variables Found Have 32 Rows" on Predict.Lm
Put Stars on Ggplot Barplots and Boxplots - to Indicate the Level of Significance (P-Value)
Overlay Histogram With Density Curve
Why Is Rbindlist "Better" Than Rbind
How to Get Week Numbers from Dates
Plot Multiple Lines in One Graph
Index Values from a Matrix Using Row, Col Indices
Dplyr Join on By=(A = B), Where a and B Are Variables Containing Strings
Forcing Garbage Collection to Run in R With the Gc() Command
Why Do R Objects Not Print in a Function or a "For" Loop
Standard Evaluation in Dplyr: Summarise a Variable Given as a Character String
Filter Data Frame by Character Column Name (In Dplyr)
Yaml Current Date in Rmarkdown
Merge Two Data Frames While Keeping the Original Row Order
How to Display Only Integer Values on an Axis Using Ggplot2
How to Display the Frequency At the Top of Each Factor in a Barplot in R
How to Fill Geom_Polygon With Different Colors Above and Below Y = 0 (Or Any Other Value)