Pass Arguments to Dplyr Functions

Pass arguments to dplyr functions

You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp(), which is defined in the lazyeval package. Concretely:

library(dplyr)
library(lazyeval)

not.uniq.per.group <- function(df, grp.var, uniq.var) {
    df %>%
        group_by_(grp.var) %>%
        summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
        filter(n_uniq > 1)
}

not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")

Note that in recent versions of dplyr the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.

See the Programming with dplyr vignette for more information on working with non-standard evaluation.

Pass variable from dataset into a function that calls dplyr

You just need to use the operator {{}}, here a reference for more details.

test<-function(var){
  iris %>% group_by(Species) %>% summarise(mean({{var}}, na.rm=TRUE))
}

test(Sepal.Width)

# A tibble: 3 x 2
  Species    `mean(Sepal.Width, na.rm = TRUE)`
  <fct>                                  <dbl>
1 setosa                                  3.43
2 versicolor                              2.77
3 virginica                               2.97

How to pass a column argument in a dplyr function in select?

We can use enquo to convert it to a quosure and then evaluate with !!

slicedata <- function(df, column_name){
  column_name = enquo(column_name)
  df %>%
    select(!!column_name, C, D, E) %>%
    group_by(!!column_name) %>%
    summarise(C = sum(C), D = sum(D), E = sum(E)

  }

slicedata(df, B)

How to pass column name as argument to function for dplyr verbs?

Here is another way of making it work. You can use .data[[var]] construct for a column name which is stored as a string:

foo <- function(data, colName) {
  
  result <- data %>%
    group_by(.data[[colName]]) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows

In case you decide not to pass the ColName as a string you you wrap it with a pair of curly braces inside your function to get the similar result:

foo <- function(data, colName) {
  
  result <- data %>%
    group_by({{ colName }}) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows

Passing arguments to dplyr summarize function

You need to use Non-Standard Evaluation (NSE) to use dplyr functions programmatically alongside lazyeval. The dplyr NSE vignette covers it fairly well.

library(dplyr)
library(lazyeval)

data <- group_by(iris, Species)

SummaryStatistics <- function(table, field){
  table %>%
    summarise_(count = ~n(),
              min = interp(~min(var, na.rm = T), var = as.name(field)),
              mean = interp(~mean(var, na.rm = T, trim=0.05), var = as.name(field)),
              median = interp(~median(var, na.rm = T), var = as.name(field)))
}

SummaryStatistics(data, "Sepal.Length")

# A tibble: 3 × 5
     Species count   min     mean median
      <fctr> <int> <dbl>    <dbl>  <dbl>
1     setosa    50   4.3 5.002174    5.0
2 versicolor    50   4.9 5.934783    5.9
3  virginica    50   4.9 6.593478    6.5

passing arguments for summaries dplyr package in R

Are you after something like this?

library(tidyverse)

summary_fn <- function(data, ..., select_var, fun) {   
    group <- enquos(...)
    var <- enquo(select_var)   
    funs <- map(setNames(fun, fun), ~.x)   
    data %>% 
        group_by(!!!group) %>% 
        summarise(across(!!var, funs), .groups = "drop")  
}

summary_fn(mtcars, cyl, am, select_var = mpg, fun = c("mean", "max"))
## A tibble: 6 x 4
#    cyl    am mpg_mean mpg_max
#  <dbl> <dbl>    <dbl>   <dbl>
#1     4     0     22.9    24.4
#2     4     1     28.1    33.9
#3     6     0     19.1    21.4
#4     6     1     20.6    21  
#5     8     0     15.0    19.2
#6     8     1     15.4    15.8

If you provide fun as a named list you can skip the funs <- map(...) step.

PS. Replacing enquo with ensym and enquos with ensyms also works.

Passing column name as argument in function within pipes

You need to make use of non standard evaluation which is worth a quick read about. In this case you most likely need to !! infront of var in the mutate line.

Here's the line:

mutate(new_variable = !!sym(var) * 100)

Pass Arguments to Dplyr Functions