Using Dplyr Within a Function, Non-Standard Evaluation

how to use non-standard evaluation in R

The return of sym should be evaluated with eval or rlang::eval_tidy before they can be used in plot. For example:

a <- 1:10

x <- sym('a')

plot(eval(x))
plot(rlang::eval_tidy(x))

!! or !!! are forcing operators used to force evaluation in tidyverse functions.

Functions and non-standard evaluation in dplyr

You could do the following :

library(tidyverse)
xy <- data.frame(xvar = 1:10, yvar = 11:20)

plotfunc <- function(data, x, y){
x <- enquo(x)
y <- enquo(y)
print(
ggplot(data, aes(x = !!x, y = (!!y)^2)) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)

Non standard evaluation basically means that you're passing the argument as an expression rather than a value. quo and enquo also associate an evaluation environment to this expression.

Hadley Wickham introduces it like this in his book :

In most programming languages, you can only access the values of a
function’s arguments. In R, you can also access the code used to
compute them. This makes it possible to evaluate code in non-standard
ways: to use what is known as non-standard evaluation, or NSE for
short. NSE is particularly useful for functions when doing interactive
data analysis because it can dramatically reduce the amount of typing.

Non standard evaluation in dplyr: how do you indirect a function's multiple arguments?

You can define a arg for the data.frame and add the ... for others variables to group by

testfunc <- function(df,...) {
df %>%
group_by(...) %>%
summarise(mpg = mean(mpg))
}
testfunc(mtcars,cyl,gear)

dplyr group_by and summarize with non-standard evaluation

In this case, it is better to use ensym as we are passing a string. Also, the ensym works with unquoted argument as well

foo2 <- function(df, var) {
var <- ensym(var)
df %>%
group_by(a) %>%
summarize(trues=sum(!!var),
falses=sum(! (!!var)))
}
foo2(df, 'b')
# A tibble: 2 x 3
# a trues falses
#* <dbl> <int> <int>
#1 1 2 1
#2 2 1 2

foo2(df, b)
# A tibble: 2 x 3
# a trues falses
#* <dbl> <int> <int>
#1 1 2 1
#2 2 1 2

If the argument passed is an object, evaluate (!!) while passing into the function to avoid the literal evaluation

foo2(df, !!var)
# A tibble: 2 x 3
# a trues falses
#* <dbl> <int> <int>
#1 1 2 1
#2 2 1 2

Non-standard evaluation in dplyr when using dots for variable number of arguments

Inside your function, across(..., should instead be across(c(...),.

library(dplyr, warn.conflicts = FALSE)
sessionInfo()$otherPkgs$dplyr$Version
#> [1] "1.0.7"

tib <- tibble(
x = c("cats and dogs", "foxes and hounds"),
y = c("whales and dolphins", "cats and foxes"),
z = c("dogs and geese", "cats and mice")
)

filter_words <- function(.data, ...) {
words_to_filter <- c("cat", "dog")

.data %>% mutate(
across(c(...), ~ gsub(
paste0(words_to_filter, collapse = "|"),
"#@!*", ., perl = TRUE
)
)
)
}

tib %>%
filter_words(x, y)
#> # A tibble: 2 × 3
#> x y z
#> <chr> <chr> <chr>
#> 1 #@!*s and #@!*s whales and dolphins dogs and geese
#> 2 foxes and hounds #@!*s and foxes cats and mice

Created on 2022-01-17 by the reprex package (v2.0.1)

What is non-standard evaluation and how can you pass an undefined variable to a function in R?

For the second question, the reason you can pass x like a variable rather than a string is due to non-standard evaluation. Effectively, the function arguments are captured rather than being immediately evaluated, and then evaluated within the scope that they exist. For example, with the quote() function, we can capture the input as-is, rather than looking for the value inside var. Then, we can evaluate it inside another environment like the mtcars data frame.

var <- quote(mpg)
> var
mpg

eval(var, envir = mtcars)
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

We can make a similar use of NSE within functions:

f <- function(x) {
input <- substitute(x)
print(input)
eval(input, envir = mtcars)
}

Here, we capture whatever was passed to the argument, and then execute it in the scope of the mtcars data frame.

f(cyl)
cyl
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

You can read more about this at the above link and here.

Using Standard Evaluation

We can achieve the same results without NSE, but the way we call the functions will differ. In this case, arguments will be immediately evaluated and you will get an object not found error if you pass an undefined variable to the function.

f <- function(x) {
print(x)
mtcars[[x]]
}

To use this function, mpg must be passed as a string.

f("mpg")
[1] "mpg"
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

You can see the results are identical to the first example, but in this case mpg is a string rather than a captured expression. The second line of the function can be interpreted as mtcars[["mpg"]]. Trying to use NSE with this function will result in an error:

f(mpg)
Error in print(x) : object 'mpg' not found

non-standard evaluation (NSE) with dplyr in R

You should use curly-curly ({{}}) which avoids quo & !!. Also you can use count which is a shortcut for group_by + summarise.

table_summary <- function (data, group_by1){

data %>%
dplyr::count({{group_by1}}) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))
}

table_summary(clientData, agegroup)

It seems agegroup is a string. To continue with OP's approach we need to convert it to symbol (sym) and evaluate it (!!)

table_summary <- function (data, group_by1){

data %>%
dplyr::group_by(!!sym(group_by1)) %>%
dplyr::summarise(N = n()) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))

}

Using strings as arguments in custom dplyr function using non-standard evaluation

You can either use sym to turn "y" into a symbol or parse_expr to parse it into an expression, then unquote it using !!:

library(rlang)

testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!sym(myVar))

testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!parse_expr(myVar))

Result:

  x   y
1 a 0
2 b 100
3 c 200

Check my answer in this question for explanation of difference between sym and parse_expr.

How to evaluate a constructed string with non-standard evaluation using dplyr?

Use sym and := like this:

library(dplyr)
library(rlang)

t <- tibble( x_01 = c(1, 2, 3), x_02 = c(4, 5, 6))
i <- 1

new <- sym(sprintf("d_%02d", i))
var <- sym(sprintf("x_%02d", i))
t %>% mutate(!!new := (!!var) * 2)

giving:

# A tibble: 3 x 3
x_01 x_02 d_01
<dbl> <dbl> <dbl>
1 1 4 2
2 2 5 4
3 3 6 6

Also note that this is trivial in base R:

tdf <- data.frame( x_01 = c(1, 2, 3), x_02 = c(4, 5, 6))
i <- 1

new <- sprintf("d_%02d", i)
var <- sprintf("x_%02d", i)
tdf[[new]] <- 2 * tdf[[var]]


Related Topics



Leave a reply



Submit