Non-Standard Evaluation and Quasiquotation in Dplyr() Not Working as (Naively) Expected

Non-standard evaluation and quasiquotation in dplyr() not working as (naively) expected

So, I've realized that what I was struggling with in this question (and many other probelms) is not really quasiquotation and/or non-standard evaluation, but rather converting character strings into object names. Here is my new solution:

letrs_top.df <- letrs_count.df %>%
top_n(5, get(count_colname))

how to use non-standard evaluation in R

The return of sym should be evaluated with eval or rlang::eval_tidy before they can be used in plot. For example:

a <- 1:10

x <- sym('a')

plot(eval(x))
plot(rlang::eval_tidy(x))

!! or !!! are forcing operators used to force evaluation in tidyverse functions.

Non-standard eval in dplyr::mutate

We can convert the string to symbol and then evaluate

tmp %>% 
mutate(!! pct.name := (!! sym(upper)/(!! sym(lower))))
# A tibble: 6 x 10
# qa11a state_abbv fipscode qa1a reg.pct precleared year shelby pres rej.reg
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
#1 1616 AL 0100100000 34727 0.0465 1.00 2010 0 F 0.0465
#2 7293 AL 0100300000 114952 0.0634 1.00 2010 0 F 0.0634
#3 1528 AL 0100500000 16450 0.0929 1.00 2010 0 F 0.0929
#4 1219 AL 0100700000 12239 0.0996 1.00 2010 0 F 0.0996
#5 2049 AL 0100900000 31874 0.0643 1.00 2010 0 F 0.0643
#6 286 AL 0101100000 7650 0.0374 1.00 2010 0 F 0.0374

when we apply enquo on a string, it is converting to a quosure with quotes

enquo(upper)
# <quosure>
# expr: ^"qa11a"
# env: empty

Instead of converting from a string, it could be easier to do

upper <- quo(qalla)
lower <- quo(qala)

In the OP's code, calling enquo i.e. converting to quosure on a string object will result in string quosure and that is not intended

upper <- "qa11a"
lower <- "qa1a"
enquo(upper)
#<quosure>
# expr: ^"qa11a"
# env: empty

We can compare it to

upper <- quo(qa11a)
lower <- quo(qa1a)
upper
# <quosure>
# expr: ^qalla
# env: global

and executing it

tmp %>% 
mutate(!! pct.name := (!! upper)/ (!! lower))
# A tibble: 6 x 10
# qa11a state_abbv fipscode qa1a reg.pct precleared year shelby pres rej.reg
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
#1 1616 AL 0100100000 34727 0.0465 1.00 2010 0 F 0.0465
#2 7293 AL 0100300000 114952 0.0634 1.00 2010 0 F 0.0634
#3 1528 AL 0100500000 16450 0.0929 1.00 2010 0 F 0.0929
#4 1219 AL 0100700000 12239 0.0996 1.00 2010 0 F 0.0996
#5 2049 AL 0100900000 31874 0.0643 1.00 2010 0 F 0.0643
#6 286 AL 0101100000 7650 0.0374 1.00 2010 0 F 0.0374

Using pre-existing character vectors in quasiquotation of an expression with rlang

In pre 0.5.0 dplyr the underlying framework for non-standard evaluation was lazyeval and required special consideration for strings. Hadley Wickham released a fundamentally new version of dplyr with a new underbelly called rlang which provides a more consistent framework for non-standard evaluation. This was version 0.70 - here's an explanation of why 0.6.0 was skipped - https://blog.rstudio.org/2017/06/13/dplyr-0-7-0/

The following now works without any special considerations:

library("tidyverse")
my_cols <- c("Petal.Width", "Petal.Length")
iris %>%
select(my_cols)

Note that the new rlang framework adds the ability to have a vector of naked symbols using quosures

my_quos <- quos(Petal.Width, Petal.Length)
iris %>%
select(!!!my_quos)

You can read more about programming with dplyr here - http://dplyr.tidyverse.org/articles/programming.html

Comparison in Shiny

library("shiny")
library("tidyverse")
library("DT")
library("rlang")
shinyApp(
ui = fluidPage(
selectInput(
"cols_to_show",
"Columns to show",
choices = colnames(iris),
multiple = TRUE
),
dataTableOutput("verb_table"),
dataTableOutput("tidyeval_table")
),
server = function(input, output) {
output$verb_table <- renderDataTable({
iris %>%
select_(.dots = input$cols_to_show)

})

output$tidyeval_table <- renderDataTable({
iris %>%
select(!!!syms(input$cols_to_show))

})
}
)

Non-Standard Evaluation and Character Vectors

For standard evaluation, you will want to use the functions with an underscore after their given name. In this case that is select_(). And we will also need to use the .dots argument to insert your vector into the call.

d %>% select_(.dots = v)

See help(select) and vignette("nse") for more.

Create R function using dplyr::filter problem

The reason it did not work in your original function was that col_1 was string but dplyr::filter() expected "unquoted" input variable for the LHS. Thus, you need to first convert col_1 to variable using sym() then unquote it inside filter using !! (bang bang).

rlang has really nice function qq_show to show what actually happens with quoting/unquoting (see the output below)

See also this similar question

library(rlang)
library(dplyr)

# creating a function that can take either string or symbol as input
mydiff <- function(filteron, df_1 = df1, df_2 = df2) {

col_1 <- paste0(quo_name(enquo(filteron)), "x")
col_2 <- paste0(quo_name(enquo(filteron)), "y")

my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))

cat('\nwithout sym and unquote\n')
qq_show(col_1 != col_2)

cat('\nwith sym and unquote\n')
qq_show(!!sym(col_1) != !!sym(col_2))
cat('\n')

my_df %>%
select(id, col_1, col_2) %>%
filter(!!sym(col_1) != !!sym(col_2))
}

### testing: filteron as a string
mydiff("a")
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k

### testing: filteron as a symbol
mydiff(a)
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k

Created on 2018-09-28 by the reprex package (v0.2.1.9000)

Using eval(parse()) construction within dplyr

Try select(df, !!paste0('Peter_', target))

Why is enquo + !! preferable to substitute + eval

I want to give an answer that is independent of dplyr, because there is a very clear advantage to using enquo over substitute. Both look in the calling environment of a function to identify the expression that was given to that function. The difference is that substitute() does it only once, while !!enquo() will correctly walk up the entire calling stack.

Consider a simple function that uses substitute():

f <- function( myExpr ) {
eval( substitute(myExpr), list(a=2, b=3) )
}

f(a+b) # 5
f(a*b) # 6

This functionality breaks when the call is nested inside another function:

g <- function( myExpr ) {
val <- f( substitute(myExpr) )
## Do some stuff
val
}

g(a+b)
# myExpr <-- OOPS

Now consider the same functions re-written using enquo():

library( rlang )

f2 <- function( myExpr ) {
eval_tidy( enquo(myExpr), list(a=2, b=3) )
}

g2 <- function( myExpr ) {
val <- f2( !!enquo(myExpr) )
val
}

g2( a+b ) # 5
g2( b/a ) # 1.5

And that is why enquo() + !! is preferable to substitute() + eval(). dplyr simply takes full advantage of this property to build a coherent set of NSE functions.

UPDATE: rlang 0.4.0 introduced a new operator {{ (pronounced "curly curly"), which is effectively a short hand for !!enquo(). This allows us to simplify the definition of g2 to

g2 <- function( myExpr ) {
val <- f2( {{myExpr}} )
val
}

Using Dplyr within a user-defined function to summarise data then plot it

First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar] is wrong.

About the function, it's a problem of evaluation.

This structure below is working:

ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)

df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))

ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}

Look that the parameters were transformed with enquo() and then passed in the function using !!. So, you can pass the arguments without quoting them.

ConsistencyPlot(df, JudicialOrientation, Year, Loss)

I hope you find it useful.



Related Topics



Leave a reply



Submit