Use Variable Names in Functions of Dplyr

Using Variable names for dplyr inside function

sum1 <- function(df, group_var,x,y) {

group_var <- enquo(group_var)

x = as.name(x)
y = as.name(y)

df.temp<- df %>%
group_by(!!group_var) %>%
mutate(
sum = !!enquo(x)+!!enquo(y)
)

return(df.temp)
}

sum1(df, g1, A.Key, B.Key)
# A tibble: 5 x 4
# Groups: g1 [5]
g1 a b sum
<dbl> <int> <int> <int>
1 1. 3 2 5
2 2. 2 1 3
3 3. 1 3 4
4 4. 4 4 8
5 5. 5 5 10

Providing data and variable names in a function in R

This seems like a very unusual way to write an R function, but you could do

my_func <- function(data, var_mileage, var_volume, var_weight){

eval(substitute({
var_mileage_km_l <- 0.43 * var_mileage
var_volume_l <- 0.016 * var_volume
var_weight_kg <- 0.45 * var_weight

m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)

summary(m)
}), envir = data)
}

The substitute() injects the symbols you pass as the column names into the expression. Then you can evaluate it in the context of the data.frame.

Alternatively you could do something like

my_func <- function(data, var_mileage, var_volume, var_weight){

var_mileage <- eval(substitute(var_mileage), data)
var_volume <- eval(substitute(var_volume), data)
var_weight <- eval(substitute(var_weight), data)

var_mileage_km_l <- 0.43 * var_mileage
var_volume_l <- 0.016 * var_volume
var_weight_kg <- 0.45 * var_weight

m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)

summary(m)
}

Or one other common trick is to turn the column names as strings.

my_func <- function(data, var_mileage, var_volume, var_weight){

var_mileage_km_l <- 0.43 * data[[var_mileage]]
var_volume_l <- 0.016 * data[[var_volume]]
var_weight_kg <- 0.45 * data[[var_weight]]

m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)

summary(m)
}
my_func(dataset1, "mpg", "disp", "wt")

Use a variable name as function argument

Allan Cameron's answer is obviously correct and only requires base R, just for posterity's sake, here's the tidy version.

example_db <- data.frame(name=c("A","B","C"), 
value_1=c(1,2,3),
value_2=c(2,3,1))



advanced_filter <- function(data,variable,limit){
require(dplyr)
vbl <- enquo(variable)
data %>%
dplyr::filter(!!vbl > limit)
}

advanced_filter(example_db,value_1,2)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> name value_1 value_2
#> 1 C 3 1

Created on 2022-01-28 by the reprex package (v2.0.1)

Or, following @TimTeaFan's comment below:

example_db <- data.frame(name=c("A","B","C"), 
value_1=c(1,2,3),
value_2=c(2,3,1))



advanced_filter <- function(data,variable,limit){
require(dplyr)
data %>%
dplyr::filter({{variable}} > limit)
}

advanced_filter(example_db,value_1,2)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> name value_1 value_2
#> 1 C 3 1

Created on 2022-01-28 by the reprex package (v2.0.1)

dplyr - using column names as function arguments

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

dynamicaly name a new variable / column within a custom function dplyr mutate and paste

We may use the arguments as unquoted and use {{}} for evaluation

my_fun <- function(dataf, V1, V2){
dataf %>%
dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark = ",") ,
'\n(' , format({{V2}}, big.mark = ",") , ')'))
}

-testing

my_fun(df, speed1, n1)
string speed1 speed2 n1 n2 speed1_n1
1 car 7886.962 3218.585 37 83 7,886.962\n(37)
2 train 9534.978 5524.649 98 34 9,534.978\n(98)
3 bike 6984.790 9476.838 60 55 6,984.790\n(60)
4 plain 6543.198 2638.609 9 53 6,543.198\n( 9)

Pass a string as variable name in dplyr::filter

!! or UQ evaluates the variable, so mtcars %>% filter(!!var == 4) is the same as mtcars %>% filter('cyl' == 4) where the condition always evaluates to false; You can prove this by printing !!var in the filter function:

mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)

To evaluate var to the cyl column, you need to convert var to a symbol of cyl first, then evaluate the symbol cyl to a column:

Using rlang:

library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)

# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# ...

Or use as.symbol/as.name from baseR:

mtcars %>% filter((!!as.symbol(var)) == 4)

mtcars %>% filter((!!as.name(var)) == 4)

How to print variable name from function argument using {{}} in R?

{{ works exclusively in tidyverse data-masking functions. print() and glue() are not such functions.

You can do print(enquo(var)). This (1) defuses var and prevents it from being evaluated; (2) prints the defused expression.

You could also create your own function to print a variable by wrapping this pattern:

print_arg <- function(arg) print(enquo(arg))

Since it uses the tidy eval operator enquo(), it automatically supports {{. You can call it like this:

print_arg({{ some_arg }})


Related Topics



Leave a reply



Submit