Why Is Enquo + !! Preferable to Substitute + Eval

Why is enquo + !! preferable to substitute + eval

I want to give an answer that is independent of dplyr, because there is a very clear advantage to using enquo over substitute. Both look in the calling environment of a function to identify the expression that was given to that function. The difference is that substitute() does it only once, while !!enquo() will correctly walk up the entire calling stack.

Consider a simple function that uses substitute():

f <- function( myExpr ) {
eval( substitute(myExpr), list(a=2, b=3) )
}

f(a+b) # 5
f(a*b) # 6

This functionality breaks when the call is nested inside another function:

g <- function( myExpr ) {
val <- f( substitute(myExpr) )
## Do some stuff
val
}

g(a+b)
# myExpr <-- OOPS

Now consider the same functions re-written using enquo():

library( rlang )

f2 <- function( myExpr ) {
eval_tidy( enquo(myExpr), list(a=2, b=3) )
}

g2 <- function( myExpr ) {
val <- f2( !!enquo(myExpr) )
val
}

g2( a+b ) # 5
g2( b/a ) # 1.5

And that is why enquo() + !! is preferable to substitute() + eval(). dplyr simply takes full advantage of this property to build a coherent set of NSE functions.

UPDATE: rlang 0.4.0 introduced a new operator {{ (pronounced "curly curly"), which is effectively a short hand for !!enquo(). This allows us to simplify the definition of g2 to

g2 <- function( myExpr ) {
val <- f2( {{myExpr}} )
val
}

Understanding when to use ensym, sym vs enquo in a function

Your understanding is correct. sym/ensym is preferred when referencing a column in an existing data frame. enquo() will, of course, work as well, but it captures any arbitrary expression, allowing the user to specify things like mpg * cyl or log10(mpg + cyl)/2. If your downstream code assumes that xvar and yvar are single columns, having arbitrary expressions can lead to problems or unexpected behavior. In that sense, ensym() acts an argument verification step when you expect a reference to a single column.

As for converting symbols to strings, one approach is to use deparse():

median(dat[[deparse(ensym(xvar))]])

To get rlang::as_string to work, you need to drop !!, because you want to convert the expression itself to a string, not what the expression is referring to (e.g., mpg, cyl, etc.):

median(dat[[rlang::as_string(ensym(xvar))]])

What is the difference between ensym and enquo when programming with dplyr?

Another take :

library(rlang)
library(dplyr, warn.conflicts = FALSE)

test <- function(x){
Species <- "bar"
cat("--- enquo builds a quosure from any expression\n")
print(enquo(x))
cat("--- ensym captures a symbol or a literal string as a symbol\n")
print(ensym(x))
cat("--- evaltidy will evaluate the quosure in its environment\n")
print(eval_tidy(enquo(x)))
cat("--- evaltidy will evaluate a symbol locally\n")
print(eval_tidy(ensym(x)))
cat("--- but both work fine where the environment doesn't matter\n")
identical(select(iris,!!ensym(x)), select(iris,!!enquo(x)))
}

Species = "foo"
test(Species)
#> --- enquo builds a quosure from any expression
#> <quosure>
#> expr: ^Species
#> env: global
#> --- ensym captures a symbol or a literal string as a symbol
#> Species
#> --- evaltidy will evaluate the quosure in its environment
#> [1] "foo"
#> --- evaltidy will evaluate a symbol locally
#> [1] "bar"
#> --- but both work fine where the environment doesn't matter
#> [1] TRUE

test("Species")
#> --- enquo builds a quosure from any expression
#> <quosure>
#> expr: ^"Species"
#> env: empty
#> --- ensym captures a symbol or a literal string as a symbol
#> Species
#> --- evaltidy will evaluate the quosure in its environment
#> [1] "Species"
#> --- evaltidy will evaluate a symbol locally
#> [1] "bar"
#> --- but both work fine where the environment doesn't matter
#> [1] TRUE
test(paste0("Spec","ies"))
#> --- enquo builds a quosure from any expression
#> <quosure>
#> expr: ^paste0("Spec", "ies")
#> env: global
#> --- ensym captures a symbol or a literal string as a symbol
#> Only strings can be converted to symbols

Created on 2019-09-23 by the reprex package (v0.3.0)

Tidy Eval, using enquo with infer package

The issue is in the formula. We can use paste after converting the quosure to string (quo_name) and convert the string in to a formula object

f <- function(dataset, col){
col <- enquo(col)
dataset %>%
specify(as.formula(paste0(quo_name(col), '~ am'))) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate("diff in means", order = c("1", "0"))
}

f(mtcars, mpg)
# A tibble: 100 x 2
# replicate stat
# <int> <dbl>
# 1 1 8.41
# 2 2 10.7
# 3 3 7.65
# 4 4 7.21
# 5 5 7.47
# 6 6 6.59
# 7 7 9.32
# 8 8 5.70
# 9 9 8.25
#10 10 6.24
# ... with 90 more rows

Based on @Lionel Henry's suggetion

f <- function(dataset, col){
col <- ensym(col)
g <- expr(!!col ~ am)
dataset %>%
specify(g) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate("diff in means", order = c("1", "0"))
}

f(mtcars, mpg)

Tidyeval quo vs enquo

TLDR: In the first version, you have created a self-reference (a symbol that points to itself). The other versions work but you actually don't need quosures or capturing arguments here because you are not referring to data frame columns. This also explains why both the quo() and the enquo() versions work the same. You can just pass the argument in the normal way, without any quoting, though it's still a good idea to unquote with !! to avoid any data masking bug.

You can use qq_show() around the filter() call to discover the differences in syntax:

MyFilter <- function(data, filtersVector) {
filtersVector <- quo(filtersVector)

rlang::qq_show(
result <- data %>% filter(Species %in% !!filtersVector)
)
}

MyFilter(iris, c("setosa", "virginica"))
#> result <- data %>% filter(Species %in% (^filtersVector))

So here we are asking filter() to find the rows where Species matches the elements of filtersVector. There is no filtersVector column in your data frame, so it looks for a definition in the quosure environment. You have created a quosure with quo(), which records your expression (in this case a symbol filtersVector) and your envionment (the environment of your function). So it looks up for a filtersVector object, which contains a symbol referring to itself. It is evaluated only once so there is no infinite loop, but you're effectively trying to compare a vector to a symbol, which is a type error:

"setosa" %in% quote(filtersVector)
#> Error in match(x, table, nomatch = 0L) :
#> 'match' requires vector arguments

In your second try, you give another name to the quosure. It now works because filtersVector, in the environment of your function, still represent the argument that was passed to it (a vector).

In the third try, you use enquo() this time. Rather than capturing your expression and your environment, enquo() captures the expression and the environment of the user of your function. Let's use qq_show() again to see the difference:

MyFilter <- function(data, filtersVector) {
filtersVector<- enquo(filtersVector)

rlang::qq_show(
data %>% filter(Species %in% !!filtersVector)
)
}

MyFilter(iris, c("setosa", "virginica"))
#> data %>% filter(Species %in% (^c("setosa", "virginica")))

Now, the quosure contains a call that creates a vector on the spot, which %in% understands perfectly.

Note how you're not actually referring to data frame columns though. You're passing vectors. This means you don't need any quosure at all, and you don't need to capture the expression passed to an argument. enquo() is only useful to delay evaluation until the very end, so it can be evaluated within the data frame. If the quo() and enquo() versions produce teh same result, that's a good indication you don't need any quoting at all. Since there is no need for them, let's simplify the function by removing quosures of the equation:

MyFilter <- function(data, filtersVector) {
data %>% filter(Species %in% filtersVector)
}

MyFilter(iris, c("setosa", "virginica"))
#> # A tibble: 100 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 90 more rows

It works! But what happens if the data frame contains a filtersVector column though? It'd have precedence over the object from the environment:

iris %>%
mutate(filtersVector = "parasite vector") %>%
MyFilter(c("setosa", "virginica"))
#> # A tibble: 0 x 6
#> # ... with 6 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
#> # Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>, filtersVector <chr>

So it's still a good idea to unquote, because that will evaluate the vector right away and stick it inside the filter expression. It can no longer be masked by a column. The inlining is shown by qq_show():

MyFilter <- function(data, filtersVector) {
rlang::qq_show(
data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris2, c("setosa", "virginica"))
#> data %>% filter(Species %in% <chr: "setosa", "virginica">)

Order by multiple columns using non-standard evaluation

One option is to wrap the expression into eval.parent(substitute(...)):

my_order <- function( data, ... ) {
eval.parent(substitute( with(data, order(...)) ))
}

my_order( mtcars, cyl, mpg )
# [1] 32 21 3 9 8 27 26 19 28 18 20 11 6 10 30 1 2 4 15 16 24 7 17 31 14
# [26] 23 22 29 12 13 5 25

Note that we use eval.parent() instead of eval(), because eval/substitute combo doesn't play well with nested functions. The eval.parent() trick has been proposed by @MoodyMudskipper as a way to address this problem and allows us to seamlessly use my_order() inside other functions, including magrittr pipes:

mtcars %>% my_order(cyl)
# [1] 3 8 9 18 19 20 21 26 27 28 32 1 2 4 6 10 11 30 5 7 12 13 14 15 16
# [26] 17 22 23 24 25 29 31

Non-standard evaluation and quasiquotation in dplyr() not working as (naively) expected

So, I've realized that what I was struggling with in this question (and many other probelms) is not really quasiquotation and/or non-standard evaluation, but rather converting character strings into object names. Here is my new solution:

letrs_top.df <- letrs_count.df %>%
top_n(5, get(count_colname))

R How to Pass a function as a String Inside another Function

It seems that function is a bit finicky for some reason. One way would be to change the call and then evaulate that. For example

myfun <- "apply.quarterly"
bquote(FANG %>%
group_by(symbol) %>%
tq_transmute(select = adjusted,
mutate_fun = .(as.name(myfun)),
FUN = max,
col_rename = "max.close")) %>%
eval()

or if you prefer rlang syntax

myfun <- "apply.quarterly"
quo(FANG %>%
group_by(symbol) %>%
tq_transmute(select = adjusted,
mutate_fun = !!sym(myfun),
FUN = max,
col_rename = "max.close")) %>%
eval_tidy()

Note that we have to treat the entire expression as rlang quosure. Unless the tq_transmute function was specifically written to handle rlang features like !! then they won't work by default.



Related Topics



Leave a reply



Submit