Non-standard evaluation and quasiquotation in dplyr() not working as (naively) expected
So, I've realized that what I was struggling with in this question (and many other probelms) is not really quasiquotation and/or non-standard evaluation, but rather converting character strings into object names. Here is my new solution:
letrs_top.df <- letrs_count.df %>%
top_n(5, get(count_colname))
how to use non-standard evaluation in R
The return of sym
should be evaluated with eval
or rlang::eval_tidy
before they can be used in plot
. For example:
a <- 1:10
x <- sym('a')
plot(eval(x))
plot(rlang::eval_tidy(x))
!!
or !!!
are forcing operators used to force evaluation in tidyverse functions.
Non-standard eval in dplyr::mutate
We can convert the string to symbol and then evaluate
tmp %>%
mutate(!! pct.name := (!! sym(upper)/(!! sym(lower))))
# A tibble: 6 x 10
# qa11a state_abbv fipscode qa1a reg.pct precleared year shelby pres rej.reg
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
#1 1616 AL 0100100000 34727 0.0465 1.00 2010 0 F 0.0465
#2 7293 AL 0100300000 114952 0.0634 1.00 2010 0 F 0.0634
#3 1528 AL 0100500000 16450 0.0929 1.00 2010 0 F 0.0929
#4 1219 AL 0100700000 12239 0.0996 1.00 2010 0 F 0.0996
#5 2049 AL 0100900000 31874 0.0643 1.00 2010 0 F 0.0643
#6 286 AL 0101100000 7650 0.0374 1.00 2010 0 F 0.0374
when we apply enquo
on a string, it is converting to a quosure with quotes
enquo(upper)
# <quosure>
# expr: ^"qa11a"
# env: empty
Instead of converting from a string, it could be easier to do
upper <- quo(qalla)
lower <- quo(qala)
In the OP's code, calling enquo
i.e. converting to quosure on a string object will result in string quosure and that is not intended
upper <- "qa11a"
lower <- "qa1a"
enquo(upper)
#<quosure>
# expr: ^"qa11a"
# env: empty
We can compare it to
upper <- quo(qa11a)
lower <- quo(qa1a)
upper
# <quosure>
# expr: ^qalla
# env: global
and executing it
tmp %>%
mutate(!! pct.name := (!! upper)/ (!! lower))
# A tibble: 6 x 10
# qa11a state_abbv fipscode qa1a reg.pct precleared year shelby pres rej.reg
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
#1 1616 AL 0100100000 34727 0.0465 1.00 2010 0 F 0.0465
#2 7293 AL 0100300000 114952 0.0634 1.00 2010 0 F 0.0634
#3 1528 AL 0100500000 16450 0.0929 1.00 2010 0 F 0.0929
#4 1219 AL 0100700000 12239 0.0996 1.00 2010 0 F 0.0996
#5 2049 AL 0100900000 31874 0.0643 1.00 2010 0 F 0.0643
#6 286 AL 0101100000 7650 0.0374 1.00 2010 0 F 0.0374
Using pre-existing character vectors in quasiquotation of an expression with rlang
In pre 0.5.0 dplyr
the underlying framework for non-standard evaluation was lazyeval
and required special consideration for strings. Hadley Wickham released a fundamentally new version of dplyr
with a new underbelly called rlang
which provides a more consistent framework for non-standard evaluation. This was version 0.70 - here's an explanation of why 0.6.0 was skipped - https://blog.rstudio.org/2017/06/13/dplyr-0-7-0/
The following now works without any special considerations:
library("tidyverse")
my_cols <- c("Petal.Width", "Petal.Length")
iris %>%
select(my_cols)
Note that the new rlang
framework adds the ability to have a vector of naked symbols using quosures
my_quos <- quos(Petal.Width, Petal.Length)
iris %>%
select(!!!my_quos)
You can read more about programming with dplyr
here - http://dplyr.tidyverse.org/articles/programming.html
Comparison in Shiny
library("shiny")
library("tidyverse")
library("DT")
library("rlang")
shinyApp(
ui = fluidPage(
selectInput(
"cols_to_show",
"Columns to show",
choices = colnames(iris),
multiple = TRUE
),
dataTableOutput("verb_table"),
dataTableOutput("tidyeval_table")
),
server = function(input, output) {
output$verb_table <- renderDataTable({
iris %>%
select_(.dots = input$cols_to_show)
})
output$tidyeval_table <- renderDataTable({
iris %>%
select(!!!syms(input$cols_to_show))
})
}
)
Non-Standard Evaluation and Character Vectors
For standard evaluation, you will want to use the functions with an underscore after their given name. In this case that is select_()
. And we will also need to use the .dots
argument to insert your vector into the call.
d %>% select_(.dots = v)
See help(select)
and vignette("nse")
for more.
Create R function using dplyr::filter problem
The reason it did not work in your original function was that col_1
was string
but dplyr::filter()
expected "unquoted" input variable for the LHS. Thus, you need to first convert col_1
to variable using sym()
then unquote it inside filter
using !!
(bang bang).
rlang
has really nice function qq_show
to show what actually happens with quoting/unquoting (see the output below)
See also this similar question
library(rlang)
library(dplyr)
# creating a function that can take either string or symbol as input
mydiff <- function(filteron, df_1 = df1, df_2 = df2) {
col_1 <- paste0(quo_name(enquo(filteron)), "x")
col_2 <- paste0(quo_name(enquo(filteron)), "y")
my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
cat('\nwithout sym and unquote\n')
qq_show(col_1 != col_2)
cat('\nwith sym and unquote\n')
qq_show(!!sym(col_1) != !!sym(col_2))
cat('\n')
my_df %>%
select(id, col_1, col_2) %>%
filter(!!sym(col_1) != !!sym(col_2))
}
### testing: filteron as a string
mydiff("a")
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k
### testing: filteron as a symbol
mydiff(a)
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k
Created on 2018-09-28 by the reprex package (v0.2.1.9000)
Using eval(parse()) construction within dplyr
Try select(df, !!paste0('Peter_', target))
Why is enquo + !! preferable to substitute + eval
I want to give an answer that is independent of dplyr
, because there is a very clear advantage to using enquo
over substitute
. Both look in the calling environment of a function to identify the expression that was given to that function. The difference is that substitute()
does it only once, while !!enquo()
will correctly walk up the entire calling stack.
Consider a simple function that uses substitute()
:
f <- function( myExpr ) {
eval( substitute(myExpr), list(a=2, b=3) )
}
f(a+b) # 5
f(a*b) # 6
This functionality breaks when the call is nested inside another function:
g <- function( myExpr ) {
val <- f( substitute(myExpr) )
## Do some stuff
val
}
g(a+b)
# myExpr <-- OOPS
Now consider the same functions re-written using enquo()
:
library( rlang )
f2 <- function( myExpr ) {
eval_tidy( enquo(myExpr), list(a=2, b=3) )
}
g2 <- function( myExpr ) {
val <- f2( !!enquo(myExpr) )
val
}
g2( a+b ) # 5
g2( b/a ) # 1.5
And that is why enquo()
+ !!
is preferable to substitute()
+ eval()
. dplyr
simply takes full advantage of this property to build a coherent set of NSE functions.
UPDATE: rlang 0.4.0
introduced a new operator {{
(pronounced "curly curly"), which is effectively a short hand for !!enquo()
. This allows us to simplify the definition of g2
to
g2 <- function( myExpr ) {
val <- f2( {{myExpr}} )
val
}
Using Dplyr within a user-defined function to summarise data then plot it
First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]
. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar]
is wrong.
About the function, it's a problem of evaluation.
This structure below is working:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Look that the parameters were transformed with enquo()
and then passed in the function using !!
. So, you can pass the arguments without quoting them.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
I hope you find it useful.
Related Topics
How to Display Line Numbers for Code Chunks in Rmarkdown HTML and PDF
How to Obtain All Combinations of the Columns of a Data Frame Taken by 2
Combining .Sd with Renamed Variable Messes with Names of .Sd Columns
Cannot Install Stringi Since Xcode Command Line Tools Update
Changes in Plotting an Xts Object
Extracting HTML Table from a Website in R
Using Jupyter R Kernel with Visual Studio Code
Download .Rdata and .CSV Files from Ftp Using Rcurl (Or Any Other Method)
R: Removing Duplicate Elements in a Vector
Create All Subvectors of a Certain Length (Moving Window)
How to Determine If a Url Object Returns '404 Not Found'
R:Function to Generate a Mixture Distribution
"Non-Finite Function Value" When Using Integrate() in R
Difference of Two Character Vectors with Substring