Use Variable Names in Functions of Dplyr

Using Variable names for dplyr inside function

sum1 <- function(df, group_var,x,y) {

  group_var <- enquo(group_var)

  x = as.name(x)
  y = as.name(y)

  df.temp<- df %>%
    group_by(!!group_var) %>%
    mutate(
      sum = !!enquo(x)+!!enquo(y)
    )

  return(df.temp)
}

sum1(df, g1, A.Key, B.Key)
# A tibble: 5 x 4
# Groups:   g1 [5]
     g1     a     b   sum
  <dbl> <int> <int> <int>
1    1.     3     2     5
2    2.     2     1     3
3    3.     1     3     4
4    4.     4     4     8
5    5.     5     5    10

Providing data and variable names in a function in R

This seems like a very unusual way to write an R function, but you could do

my_func <- function(data, var_mileage, var_volume, var_weight){
  
  eval(substitute({
    var_mileage_km_l <- 0.43 * var_mileage
    var_volume_l <- 0.016 * var_volume
    var_weight_kg <- 0.45 * var_weight    
    
    m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
    
    summary(m)
  }), envir = data)
}

The substitute() injects the symbols you pass as the column names into the expression. Then you can evaluate it in the context of the data.frame.

Alternatively you could do something like

my_func <- function(data, var_mileage, var_volume, var_weight){
  
  var_mileage <- eval(substitute(var_mileage), data)
  var_volume <- eval(substitute(var_volume), data)
  var_weight <- eval(substitute(var_weight), data)
  
  var_mileage_km_l <- 0.43 * var_mileage
  var_volume_l <- 0.016 * var_volume
  var_weight_kg <- 0.45 * var_weight
    
  m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
  
  summary(m)
}

Or one other common trick is to turn the column names as strings.

my_func <- function(data, var_mileage, var_volume, var_weight){
   
  var_mileage_km_l <- 0.43 * data[[var_mileage]]
  var_volume_l <- 0.016 * data[[var_volume]]
  var_weight_kg <- 0.45 * data[[var_weight]]    
    
  m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
  
  summary(m)
}
my_func(dataset1, "mpg", "disp", "wt")

Use a variable name as function argument

Allan Cameron's answer is obviously correct and only requires base R, just for posterity's sake, here's the tidy version.

example_db <- data.frame(name=c("A","B","C"), 
                         value_1=c(1,2,3), 
                         value_2=c(2,3,1))



advanced_filter <- function(data,variable,limit){
  require(dplyr)
  vbl <- enquo(variable)
  data %>% 
    dplyr::filter(!!vbl > limit) 
}

advanced_filter(example_db,value_1,2)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#>   name value_1 value_2
#> 1    C       3       1

^{Created on 2022-01-28 by the reprex package (v2.0.1)}

Or, following @TimTeaFan's comment below:

example_db <- data.frame(name=c("A","B","C"), 
                         value_1=c(1,2,3), 
                         value_2=c(2,3,1))



advanced_filter <- function(data,variable,limit){
  require(dplyr)
  data %>% 
    dplyr::filter({{variable}} > limit) 
}

advanced_filter(example_db,value_1,2)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#>   name value_1 value_2
#> 1    C       3       1

^{Created on 2022-01-28 by the reprex package (v2.0.1)}

dplyr - using column names as function arguments

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
  myenc <- enquo(colName)
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27

dynamicaly name a new variable / column within a custom function dplyr mutate and paste

We may use the arguments as unquoted and use {{}} for evaluation

my_fun <- function(dataf, V1, V2){
   dataf %>%
   dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark   = ",") ,
      '\n(' , format({{V2}}, big.mark   = ",") , ')'))
}

-testing

my_fun(df, speed1, n1)
string   speed1   speed2 n1 n2       speed1_n1
1    car 7886.962 3218.585 37 83 7,886.962\n(37)
2  train 9534.978 5524.649 98 34 9,534.978\n(98)
3   bike 6984.790 9476.838 60 55 6,984.790\n(60)
4  plain 6543.198 2638.609  9 53 6,543.198\n( 9)

Pass a string as variable name in dplyr::filter

!! or UQ evaluates the variable, so mtcars %>% filter(!!var == 4) is the same as mtcars %>% filter('cyl' == 4) where the condition always evaluates to false; You can prove this by printing !!var in the filter function:

mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
#  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
# <0 rows> (or 0-length row.names)

To evaluate var to the cyl column, you need to convert var to a symbol of cyl first, then evaluate the symbol cyl to a column:

Using rlang:

library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)

#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# ...

Or use as.symbol/as.name from baseR:

mtcars %>% filter((!!as.symbol(var)) == 4)

mtcars %>% filter((!!as.name(var)) == 4)

How to print variable name from function argument using {{}} in R?

{{ works exclusively in tidyverse data-masking functions. print() and glue() are not such functions.

You can do print(enquo(var)). This (1) defuses var and prevents it from being evaluated; (2) prints the defused expression.

You could also create your own function to print a variable by wrapping this pattern:

print_arg <- function(arg) print(enquo(arg))

Since it uses the tidy eval operator enquo(), it automatically supports {{. You can call it like this:

print_arg({{ some_arg }})