How to Pass a Named Vector to Dplyr::Select Using Quosures

How to pass a named vector to dplyr::select using quosures?

quo (or quos for multiple) is for unquoted variable names, not strings. To convert strings to quosures use sym (or syms), and use !! or !!! as appropriate to unquote or unquote-splice:

library(dplyr)

my_data <- data_frame(foo = 0:10, bar = 10:20, meh = 20:30)
my_newnames <- c("newbar" = "bar", "newfoo" = "foo")

For strings,

move_stuff_se <- function(df, ...){
df %>% select(!!!rlang::syms(...))
}

move_stuff_se(my_data, my_newnames)
#> # A tibble: 11 x 2
#> newbar newfoo
#> <int> <int>
#> 1 10 0
#> 2 11 1
#> 3 12 2
#> 4 13 3
#> 5 14 4
#> 6 15 5
#> 7 16 6
#> 8 17 7
#> 9 18 8
#> 10 19 9
#> 11 20 10

For unquoted variable names,

move_stuff_nse <- function(df, ...){
df %>% select(!!!quos(...))
}

move_stuff_nse(my_data, newbar = bar, newfoo = foo)
#> # A tibble: 11 x 2
#> newbar newfoo
#> <int> <int>
#> 1 10 0
#> 2 11 1
#> 3 12 2
#> 4 13 3
#> 5 14 4
#> 6 15 5
#> 7 16 6
#> 8 17 7
#> 9 18 8
#> 10 19 9
#> 11 20 10

Pass a vector of variable names to arrange() in dplyr

Hadley hasn't made this obvious in the help file--only in his NSE vignette. The versions of the functions followed by underscores use standard evaluation, so you pass them vectors of strings and the like.

If I understand your problem correctly, you can just replace arrange() with arrange_() and it will work.

Specifically, pass the vector of strings as the .dots argument when you do it.

> df %>% arrange_(.dots=c("var1","var3"))
var1 var2 var3 var4
1 1 i 5 i
2 1 x 7 w
3 1 h 8 e
4 2 b 5 f
5 2 t 5 b
6 2 w 7 h
7 3 s 6 d
8 3 f 8 e
9 4 c 5 y
10 4 o 8 c

========== Update March 2018 ==============

Using the standard evaluation versions in dplyr as I have shown here is now considered deprecated. You can read Hadley's programming vignette for the new way. Basically you will use !! to unquote one variable or !!! to unquote a vector of variables inside of arrange().

When you pass those columns, if they are bare, quote them using quo() for one variable or quos() for a vector. Don't use quotation marks. See the answer by Akrun.

If your columns are already strings, then make them names using rlang::sym() for a single column or rlang::syms() for a vector. See the answer by Christos. You can also use as.name() for a single column. Unfortunately as of this writing, the information on how to use rlang::sym() has not yet made it into the vignette I link to above (eventually it will be in the section on "variadic quasiquotation" according to his draft).

Pass a list of vectors of arguments as quosures to a function with purrr and rlang

The thing that makes this difficult is the use of c() here. We really need some sort of rlang object to hold your parameters. Here's an altered function to generate your list

q_list <- function(...) {
q <- enexprs(...)
transenv <- new_environment(list(c=exprs))
purrr::map(q, function(x) {
eval_tidy(x, env = transenv)
})
}

This takes your expressions and evaulates them treating c() like enexprs(). Then you can inject those values into your function call

my_q_list <- q_list(
c(cyl, sort = TRUE),
c(cyl, gear, sort = TRUE)
)

purrr::map(my_q_list, ~eval_tidy(quo(count(mtcars, !!!.x))))

This would have been easier if you just make the expressions in a list without using c()

my_q_list <- list(
exprs(cyl, sort = TRUE),
exprs(cyl, gear, sort = TRUE)
)
purrr::map(my_q_list, ~eval_tidy(quo(count(mtcars, !!!.x))))

Use vector of columns in custom dplyr function

You don't necessarily need the function, as you can just mutate across the columns and get sums for each category.

library(tidyverse)

dat %>%
group_by(category) %>%
mutate(across(ends_with("take"), .fns = list(count = ~sum(. == "yes"))))

Or if you have a long list, then you can use vars directly in the across statement:

vars <- c("intake", "outtake", "pretake")

dat %>%
group_by(category) %>%
mutate(across(vars, .fns = list(count = ~sum(. == "yes"))))

Output

  category intake outtake pretake intake_count outtake_count pretake_count
<chr> <fct> <fct> <fct> <int> <int> <int>
1 a no yes no 0 2 0
2 b no yes yes 0 1 2
3 c no yes no 1 1 0
4 d no yes yes 1 1 2
5 e no yes no 1 1 0
6 f no yes yes 1 1 2
7 g no yes no 1 1 0
8 h no yes yes 1 1 2
9 i no yes no 1 1 0
10 j no yes yes 1 1 2
11 a no yes no 0 2 0
12 b no no yes 0 1 2
13 c yes no no 1 1 0
14 d yes no yes 1 1 2
15 e yes no no 1 1 0
16 f yes no yes 1 1 2
17 g yes no no 1 1 0
18 h yes no yes 1 1 2
19 i yes no no 1 1 0
20 j yes no yes 1 1 2

Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

There is an example with dplyr::select in https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html that uses:

dplyr::select(df, !!cols_to_select)

Why? Let's explore the options you mention:

Option 1

dplyr::select(df, cols_to_select)

As you say this fails if cols_to_select happens to be the name of a column in df, so this is wrong.

Option 4

cols_to_select_syms <- rlang::syms(c("b", "d"))  
dplyr::select(df, !!!cols_to_select_syms)

This looks more convoluted than the other solutions.

Options 2 and 3

dplyr::select(df, !!cols_to_select)
dplyr::select(df, !!!cols_to_select)

These two solutions provide the same results in this case. You can see the output of !!cols_to_select and !!!cols_to_select by doing:

dput(rlang::`!!`(cols_to_select)) # c("b", "d")
dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d")

The !! or UQ() operator evaluates its argument immediately in its context, and that is what you want.

The !!! or UQS() operator are used to pass multiple arguments at once to a function.

For character column names like in your example it does not matter if you give them as a single vector of length 2 (using !!) or as a list with two vectors of length one (using !!!). For more complex use cases you will need to use multiple arguments as a list: (using !!!)

a <- quos(contains("c"), dplyr::starts_with("b"))
dplyr::select(df, !!a) # does not work
dplyr::select(df, !!!a) # does work

How to pass formulas or quosures in dplyr verbs R as arguments

To solve the quosures issue, I had to update with install_github instead of update.packages.

This didn't solve the issue with formulas based on lazyeval::interp(), though.

For this to work, I figured out how to make the equivalent with quosures as well. Instead of

interp(~!is.na(value_), value_ = as.name("value"))

I did

quo(!is.na( !!sym("value") ))

And now all works as it should.

Cheers.

Dplyr standard evaluation using a vector of multiple strings with mutate function

There are several keys to solving this question:

  • Accessing the strings within a character vector and using these with dplyr
  • The formatting of arguments provided to the function used with mutate, here the anyNA

The goal here is to replicate this call, but using the named variable two_names instead of manually typing out c(jack,jill).

stackdf %>% rowwise %>% mutate(test = anyNA(c(jack,jill)))

# A tibble: 10 x 4
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE

1. Using dynamic variables with dplyr

  1. Using quo/quos: Does not accept strings as input. The solution using this method would be:

    two_names2 <- quos(c(jack, jill))
    stackdf %>% rowwise %>% mutate(test = anyNA(!!! two_names2))

    Note that quo takes a single argument, and thus is unquoted using !!, and for multiple arguments you can use quos and !!! respectively. This is not desirable because I do not use two_names and instead have to type out the columns I wish to use.

  2. Using as.name or rlang::sym/rlang::syms: as.name and sym take only a single input, however syms will take multiple and return a list of symbolic objects as output.

    > two_names
    [1] "jack" "jill"
    > as.name(two_names)
    jack
    > syms(two_names)
    [[1]]
    jack

    [[2]]
    jill

    Note that as.name ignores everything after the first element. However, syms appears to work appropriately here, so now we need to use this within the mutate call.

2. Using dynamic variables within mutate using anyNA or other variables

  1. Using syms and anyNA directly does not actually produce the correct result.

    > stackdf %>% rowwise %>% mutate(test = anyNA(!!! syms(two_names)))
    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 FALSE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA FALSE
    10 NA 7 NA TRUE

    Inspection of the test shows that this is only taking into account the first element, and ignoring the second element. However, if I use a different function, eg sum or paste0, it is clear that both elements are being used:

    > stackdf %>% rowwise %>% mutate(test = sum(!!! syms(two_names), 
    na.rm = TRUE))
    jack jill jane test
    <dbl> <dbl> <dbl> <dbl>
    1 1 1 1 2
    2 NA 2 2 2
    3 2 NA 3 2
    4 NA 3 4 3
    5 3 4 5 7
    6 NA NA 6 0
    7 4 5 NA 9
    8 NA 6 NA 6
    9 5 NA NA 5
    10 NA 7 NA 7

    The reason for this becomes clear when you look at the arguments for anyNA vs sum.

    function (x, recursive = FALSE) .Primitive("anyNA")

    function (..., na.rm = FALSE) .Primitive("sum")

    anyNA expects a single object x, whereas sum can take a variable list of objects (...).

  2. Simply supplying c() fixes this problem (see answer from alistaire).

    > stackdf %>% rowwise %>% mutate(test = anyNA(c(!!! syms(two_names))))
    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 TRUE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA TRUE
    10 NA 7 NA TRUE
  3. Alternately... for educational purposes, one could use a combination of sapply, any, and anyNA to produce the correct result. Here we use list so that the results are provided as a single list object.

    # this produces an error an error because the elements of !!!
    # are being passed to the arguments of sapply (X =, FUN = )
    > stackdf %>% rowwise %>%
    mutate(test = any(sapply(!!! syms(two_names), anyNA)))
    Error in mutate_impl(.data, dots) :
    Evaluation error: object 'jill' of mode 'function' was not found.

    Supplying list fixes this problem because it binds all the results into a single object.

    # the below table is the familiar incorrect result that uses only the `jack`
    > stackdf %>% rowwise %>%
    mutate(test = any(sapply(X=as.list(!!! syms(two_names)),
    FUN=anyNA)))

    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 FALSE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA FALSE
    10 NA 7 NA TRUE

    # this produces the correct answer
    > stackdf %>% rowwise %>%
    mutate(test = any(X = sapply(list(!!! syms(two_names)),
    FUN = anyNA)))

    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 TRUE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA TRUE
    10 NA 7 NA TRUE

    Understanding why these two perform differently make sense when their behavior is compared!

    > as.list(two_names)
    [[1]]
    [1] "jack"

    [[2]]
    [1] "jill"

    > list(two_names)
    [[1]]
    [1] "jack" "jill"

Tidyeval: pass list of columns as quosure to select()

This is a bit tricky because of the mix of semantics involved in this problem. pmap() takes a list and passes each element as its own argument to a function (it's kind of equivalent to !!! in that sense). Your quoting function thus needs to quote its arguments and somehow pass a list of columns to pmap().

Our quoting function can go one of two ways. Either quote (i.e., delay) the list creation, or create an actual list of quoted expressions right away:

quoting_fn1 <- function(...) {
exprs <- enquos(...)

# For illustration purposes, return the quoted inputs instead of
# doing something with them. Normally you'd call `mutate()` here:
exprs
}

quoting_fn2 <- function(...) {
expr <- quo(list(!!!enquos(...)))

expr
}

Since our first variant does nothing but return a list of quoted inputs, it's actually equivalent to quos():

quoting_fn1(a, b)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^a
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^b
#> env: global

The second version returns a quoted expression that instructs R to create a list with quoted inputs:

quoting_fn2(a, b)
#> <quosure>
#> expr: ^list(^a, ^b)
#> env: 0x7fdb69d9bd20

There is a subtle but important difference between the two. The first version creates an actual list object:

exprs <- quoting_fn1(a, b)
typeof(exprs)
#> [1] "list"

On the other hand, the second version does not return a list, it returns an expression for creating a list:

expr <- quoting_fn2(a, b)
typeof(expr)
#> [1] "language"

Let's find out which version is more appropriate for interfacing with pmap(). But first we'll give a name to the pmapped function to make the code clearer and easier to experiment with:

myfunction <- function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
args[[1]] + args[[2]]
}

Understanding how tidy eval works is hard in part because we usually don't get to observe the unquoting step. We'll use rlang::qq_show() to reveal the result of unquoting expr (the delayed list) and exprs (the actual list) with !!:

rlang::qq_show(
mutate(df, outcome = pmap_int(!!expr, myfunction))
)
#> mutate(df, outcome = pmap_int(^list(^a, ^b), myfunction))

rlang::qq_show(
mutate(df, outcome = pmap_int(!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))

When we unquote the delayed list, mutate() calls pmap_int() with list(a, b), evaluated in the data frame, which is exactly what we need:

mutate(df, outcome = pmap_int(!!expr, myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106

On the other hand, if we unquote an actual list of quoted expressions, we get an error:

mutate(df, outcome = pmap_int(!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: Element 1 is not a vector (language).

That's because the quoted expressions inside the list are not evaluated in the data frame. In fact, they are not evaluated at all. pmap() gets the quoted expressions as is, which it doesn't understand. Recall what qq_show() has shown us:

#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))

Anything inside angular brackets is passed as is. This is a sign that we should somehow have used !!! instead, to inline each element of the list of quosures in the surrounding expression. Let's try it:

rlang::qq_show(
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(^a, ^b, myfunction))

Hmm... Doesn't look right. We're supposed to pass a list to pmap_int(), and here it gets each quoted input as separate argument. Indeed we get a type error:

mutate(df, outcome = pmap_int(!!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: `.x` is not a list (integer).

That's easy to fix, just splice into a call to list():

rlang::qq_show(
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
)
#> mutate(df, outcome = pmap_int(list(^a, ^b), myfunction))

And voilà!

mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106

Using pre-existing character vectors in quasiquotation of an expression with rlang

In pre 0.5.0 dplyr the underlying framework for non-standard evaluation was lazyeval and required special consideration for strings. Hadley Wickham released a fundamentally new version of dplyr with a new underbelly called rlang which provides a more consistent framework for non-standard evaluation. This was version 0.70 - here's an explanation of why 0.6.0 was skipped - https://blog.rstudio.org/2017/06/13/dplyr-0-7-0/

The following now works without any special considerations:

library("tidyverse")
my_cols <- c("Petal.Width", "Petal.Length")
iris %>%
select(my_cols)

Note that the new rlang framework adds the ability to have a vector of naked symbols using quosures

my_quos <- quos(Petal.Width, Petal.Length)
iris %>%
select(!!!my_quos)

You can read more about programming with dplyr here - http://dplyr.tidyverse.org/articles/programming.html

Comparison in Shiny

library("shiny")
library("tidyverse")
library("DT")
library("rlang")
shinyApp(
ui = fluidPage(
selectInput(
"cols_to_show",
"Columns to show",
choices = colnames(iris),
multiple = TRUE
),
dataTableOutput("verb_table"),
dataTableOutput("tidyeval_table")
),
server = function(input, output) {
output$verb_table <- renderDataTable({
iris %>%
select_(.dots = input$cols_to_show)

})

output$tidyeval_table <- renderDataTable({
iris %>%
select(!!!syms(input$cols_to_show))

})
}
)


Related Topics



Leave a reply



Submit