Why is enquo + !! preferable to substitute + eval
I want to give an answer that is independent of dplyr
, because there is a very clear advantage to using enquo
over substitute
. Both look in the calling environment of a function to identify the expression that was given to that function. The difference is that substitute()
does it only once, while !!enquo()
will correctly walk up the entire calling stack.
Consider a simple function that uses substitute()
:
f <- function( myExpr ) {
eval( substitute(myExpr), list(a=2, b=3) )
}
f(a+b) # 5
f(a*b) # 6
This functionality breaks when the call is nested inside another function:
g <- function( myExpr ) {
val <- f( substitute(myExpr) )
## Do some stuff
val
}
g(a+b)
# myExpr <-- OOPS
Now consider the same functions re-written using enquo()
:
library( rlang )
f2 <- function( myExpr ) {
eval_tidy( enquo(myExpr), list(a=2, b=3) )
}
g2 <- function( myExpr ) {
val <- f2( !!enquo(myExpr) )
val
}
g2( a+b ) # 5
g2( b/a ) # 1.5
And that is why enquo()
+ !!
is preferable to substitute()
+ eval()
. dplyr
simply takes full advantage of this property to build a coherent set of NSE functions.
UPDATE: rlang 0.4.0
introduced a new operator {{
(pronounced "curly curly"), which is effectively a short hand for !!enquo()
. This allows us to simplify the definition of g2
to
g2 <- function( myExpr ) {
val <- f2( {{myExpr}} )
val
}
Understanding when to use ensym, sym vs enquo in a function
Your understanding is correct. sym
/ensym
is preferred when referencing a column in an existing data frame. enquo()
will, of course, work as well, but it captures any arbitrary expression, allowing the user to specify things like mpg * cyl
or log10(mpg + cyl)/2
. If your downstream code assumes that xvar
and yvar
are single columns, having arbitrary expressions can lead to problems or unexpected behavior. In that sense, ensym()
acts an argument verification step when you expect a reference to a single column.
As for converting symbols to strings, one approach is to use deparse()
:
median(dat[[deparse(ensym(xvar))]])
To get rlang::as_string
to work, you need to drop !!
, because you want to convert the expression itself to a string, not what the expression is referring to (e.g., mpg
, cyl
, etc.):
median(dat[[rlang::as_string(ensym(xvar))]])
What is the difference between ensym and enquo when programming with dplyr?
Another take :
library(rlang)
library(dplyr, warn.conflicts = FALSE)
test <- function(x){
Species <- "bar"
cat("--- enquo builds a quosure from any expression\n")
print(enquo(x))
cat("--- ensym captures a symbol or a literal string as a symbol\n")
print(ensym(x))
cat("--- evaltidy will evaluate the quosure in its environment\n")
print(eval_tidy(enquo(x)))
cat("--- evaltidy will evaluate a symbol locally\n")
print(eval_tidy(ensym(x)))
cat("--- but both work fine where the environment doesn't matter\n")
identical(select(iris,!!ensym(x)), select(iris,!!enquo(x)))
}
Species = "foo"
test(Species)
#> --- enquo builds a quosure from any expression
#> <quosure>
#> expr: ^Species
#> env: global
#> --- ensym captures a symbol or a literal string as a symbol
#> Species
#> --- evaltidy will evaluate the quosure in its environment
#> [1] "foo"
#> --- evaltidy will evaluate a symbol locally
#> [1] "bar"
#> --- but both work fine where the environment doesn't matter
#> [1] TRUE
test("Species")
#> --- enquo builds a quosure from any expression
#> <quosure>
#> expr: ^"Species"
#> env: empty
#> --- ensym captures a symbol or a literal string as a symbol
#> Species
#> --- evaltidy will evaluate the quosure in its environment
#> [1] "Species"
#> --- evaltidy will evaluate a symbol locally
#> [1] "bar"
#> --- but both work fine where the environment doesn't matter
#> [1] TRUE
test(paste0("Spec","ies"))
#> --- enquo builds a quosure from any expression
#> <quosure>
#> expr: ^paste0("Spec", "ies")
#> env: global
#> --- ensym captures a symbol or a literal string as a symbol
#> Only strings can be converted to symbols
Created on 2019-09-23 by the reprex package (v0.3.0)
Tidy Eval, using enquo with infer package
The issue is in the formula. We can use paste
after converting the quosure to string (quo_name
) and convert the string in to a formula
object
f <- function(dataset, col){
col <- enquo(col)
dataset %>%
specify(as.formula(paste0(quo_name(col), '~ am'))) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate("diff in means", order = c("1", "0"))
}
f(mtcars, mpg)
# A tibble: 100 x 2
# replicate stat
# <int> <dbl>
# 1 1 8.41
# 2 2 10.7
# 3 3 7.65
# 4 4 7.21
# 5 5 7.47
# 6 6 6.59
# 7 7 9.32
# 8 8 5.70
# 9 9 8.25
#10 10 6.24
# ... with 90 more rows
Based on @Lionel Henry's suggetion
f <- function(dataset, col){
col <- ensym(col)
g <- expr(!!col ~ am)
dataset %>%
specify(g) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate("diff in means", order = c("1", "0"))
}
f(mtcars, mpg)
Tidyeval quo vs enquo
TLDR: In the first version, you have created a self-reference (a symbol that points to itself). The other versions work but you actually don't need quosures or capturing arguments here because you are not referring to data frame columns. This also explains why both the quo()
and the enquo()
versions work the same. You can just pass the argument in the normal way, without any quoting, though it's still a good idea to unquote with !!
to avoid any data masking bug.
You can use qq_show()
around the filter()
call to discover the differences in syntax:
MyFilter <- function(data, filtersVector) {
filtersVector <- quo(filtersVector)
rlang::qq_show(
result <- data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris, c("setosa", "virginica"))
#> result <- data %>% filter(Species %in% (^filtersVector))
So here we are asking filter()
to find the rows where Species
matches the elements of filtersVector
. There is no filtersVector
column in your data frame, so it looks for a definition in the quosure environment. You have created a quosure with quo()
, which records your expression (in this case a symbol filtersVector
) and your envionment (the environment of your function). So it looks up for a filtersVector
object, which contains a symbol referring to itself. It is evaluated only once so there is no infinite loop, but you're effectively trying to compare a vector to a symbol, which is a type error:
"setosa" %in% quote(filtersVector)
#> Error in match(x, table, nomatch = 0L) :
#> 'match' requires vector arguments
In your second try, you give another name to the quosure. It now works because filtersVector
, in the environment of your function, still represent the argument that was passed to it (a vector).
In the third try, you use enquo()
this time. Rather than capturing your expression and your environment, enquo()
captures the expression and the environment of the user of your function. Let's use qq_show()
again to see the difference:
MyFilter <- function(data, filtersVector) {
filtersVector<- enquo(filtersVector)
rlang::qq_show(
data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris, c("setosa", "virginica"))
#> data %>% filter(Species %in% (^c("setosa", "virginica")))
Now, the quosure contains a call that creates a vector on the spot, which %in%
understands perfectly.
Note how you're not actually referring to data frame columns though. You're passing vectors. This means you don't need any quosure at all, and you don't need to capture the expression passed to an argument. enquo()
is only useful to delay evaluation until the very end, so it can be evaluated within the data frame. If the quo()
and enquo()
versions produce teh same result, that's a good indication you don't need any quoting at all. Since there is no need for them, let's simplify the function by removing quosures of the equation:
MyFilter <- function(data, filtersVector) {
data %>% filter(Species %in% filtersVector)
}
MyFilter(iris, c("setosa", "virginica"))
#> # A tibble: 100 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 90 more rows
It works! But what happens if the data frame contains a filtersVector
column though? It'd have precedence over the object from the environment:
iris %>%
mutate(filtersVector = "parasite vector") %>%
MyFilter(c("setosa", "virginica"))
#> # A tibble: 0 x 6
#> # ... with 6 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
#> # Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>, filtersVector <chr>
So it's still a good idea to unquote, because that will evaluate the vector right away and stick it inside the filter expression. It can no longer be masked by a column. The inlining is shown by qq_show()
:
MyFilter <- function(data, filtersVector) {
rlang::qq_show(
data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris2, c("setosa", "virginica"))
#> data %>% filter(Species %in% <chr: "setosa", "virginica">)
Order by multiple columns using non-standard evaluation
One option is to wrap the expression into eval.parent(substitute(...))
:
my_order <- function( data, ... ) {
eval.parent(substitute( with(data, order(...)) ))
}
my_order( mtcars, cyl, mpg )
# [1] 32 21 3 9 8 27 26 19 28 18 20 11 6 10 30 1 2 4 15 16 24 7 17 31 14
# [26] 23 22 29 12 13 5 25
Note that we use eval.parent()
instead of eval()
, because eval/substitute combo doesn't play well with nested functions. The eval.parent()
trick has been proposed by @MoodyMudskipper as a way to address this problem and allows us to seamlessly use my_order()
inside other functions, including magrittr pipes:
mtcars %>% my_order(cyl)
# [1] 3 8 9 18 19 20 21 26 27 28 32 1 2 4 6 10 11 30 5 7 12 13 14 15 16
# [26] 17 22 23 24 25 29 31
Non-standard evaluation and quasiquotation in dplyr() not working as (naively) expected
So, I've realized that what I was struggling with in this question (and many other probelms) is not really quasiquotation and/or non-standard evaluation, but rather converting character strings into object names. Here is my new solution:
letrs_top.df <- letrs_count.df %>%
top_n(5, get(count_colname))
R How to Pass a function as a String Inside another Function
It seems that function is a bit finicky for some reason. One way would be to change the call and then evaulate that. For example
myfun <- "apply.quarterly"
bquote(FANG %>%
group_by(symbol) %>%
tq_transmute(select = adjusted,
mutate_fun = .(as.name(myfun)),
FUN = max,
col_rename = "max.close")) %>%
eval()
or if you prefer rlang syntax
myfun <- "apply.quarterly"
quo(FANG %>%
group_by(symbol) %>%
tq_transmute(select = adjusted,
mutate_fun = !!sym(myfun),
FUN = max,
col_rename = "max.close")) %>%
eval_tidy()
Note that we have to treat the entire expression as rlang
quosure. Unless the tq_transmute
function was specifically written to handle rlang features like !!
then they won't work by default.
Related Topics
Argument Is of Length Zero in If Statement
Controlling Line Color and Line Type in Ggplot Legend
Align Ggplot2 Plots Vertically
Using a Pre-Defined Color Palette in Ggplot
Read.Csv, Header on First Line, Skip Second Line
Update a Value in One Column Based on Criteria in Other Columns
How to Produce Stacked Bars Within Grouped Barchart in R
In 'Knitr' How to Test for If the Output Will Be PDF or Word
Make Conditionalpanel Depend on Files Uploaded with Fileinput
Dealing with True, False, Na and Nan
Add Multiple Columns to R Data.Table in One Function Call
Run a for Loop in Parallel in R
Floating Point Less-Than-Equal Comparisons After Addition and Substraction
What Does the Capital Letter "I" in R Linear Regression Formula Mean