How to Pass Strings Denoting Expressions to Dplyr 0.7 Verbs

How to pass strings denoting expressions to dplyr 0.7 verbs?

It's important to note that, in this simple example, we have control of how the expressions are created. So the best way to pass the expressions is to construct and pass quosures directly using quos():

library(tidyverse)
library(rlang)

group_by_and_tally <- function(data, groups) {
data %>%
group_by(UQS(groups)) %>%
tally()
}

my_groups <- quos(2 * cyl, am)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2

However, if we receive the expressions from an outside source in the form of strings, we can simply parse the expressions first, which converts them to quosures:

my_groups <- c('2 * cyl', 'am')
my_groups <- my_groups %>% map(parse_quosure)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2

Again, we should only do this if we are getting expressions from an outside source that provides them as strings - otherwise we should make quosures directly in the R source code.

How to parametrize function calls in dplyr 0.7?

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of
column names, or a numeric vector of column positions.

How to write use quos in a formula in R?

Try to use sym and evaluate it with !!. I would also pass additional data argument to the function.

library(dplyr)
library(rlang)

f <- function(data, x, y, new) {
data %>% mutate (!!new := !!sym(x) * !!sym(y))
}

A %>% f("x", "y", "new")

# x y new
#1 1 11 11
#2 2 12 24
#3 3 13 39
#4 4 14 56
#5 5 15 75

identical(A %>% f("x", "y", "new"), A %>% mutate (new = x * y))
#[1] TRUE

Using unquoted strings in with `dplyr` verbs: `select` and `arrange` working differently

You need to convert your string to variable first (using sym()) then unquote it inside arrange().

df %>% arrange(!!sym(v))

#> # A tibble: 3 x 2
#> x y
#> <dbl> <dbl>
#> 1 3 3
#> 2 2 6
#> 3 1 8

select() can directly take string input but it's not recommended

df %>% select(v)

#> Note: Using an external vector in selections is ambiguous.
#> i Use `all_of(v)` instead of `v` to silence this message.
#> i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> # A tibble: 3 x 1
#> y
#> <dbl>
#> 1 8
#> 2 6
#> 3 3

Created on 2020-11-21 by the reprex package (v0.3.0)

Convert strings to call dplyr functions

You can use eval with parse_expr

library(dplyr)
library(rlang)

filter_and_compute <- function(df,
condition,
col_to_modify,
calculus) {
df %>%
filter(eval(parse_expr(condition))) %>%
mutate(!! col_to_modify := eval(parse_expr(calculus))) %>%
rbind(df %>% filter(! eval(parse_expr(condition))))

}

filter_and_compute(mtcars, "cyl == 4", "mpg", "2 * hp")

# mpg cyl disp hp drat wt qsec vs am gear carb
#1 186.0 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 124.0 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 190.0 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#4 132.0 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#5 104.0 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#6 130.0 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#7 194.0 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#8 132.0 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#9 182.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#10 226.0 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#11 218.0 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#12 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#....

dplyr 0.7 equivalent for deprecated mutate_

To expand a little bit on MrFlick's example, let's assume you have a number of instructions stored as strings, as well as the corresponding names that you want to assign to the resulting computations:

ln <- list( "test2", "test3" )
lf <- list( "substr(test, 1, 5)", "substr(test, 5, 5)" )

Match up names to their instructions and convert everything to quosures:

ll <- setNames( lf, ln ) %>% lapply( rlang::parse_quosure )

As per aosmith's suggestion, the entire list can now be passed to mutate, using the special !!! operator:

tibble( test = "test@test" ) %>% mutate( !!! ll )
# # A tibble: 1 x 3
# test test2 test3
# <chr> <chr> <chr>
# 1 test@test test@ @

Pass column names as strings to group_by and summarize

For this you can now use _at versions of the verbs

df %>%  
group_by_at(cols2group) %>%
summarize_at(.vars = col2summarize, .funs = min)

Edit (2021-06-09):

Please see Ronak Shah's answer, using

mutate(across(all_of(cols2summarize), min))

Now the preferred option

How does dplyr pass in non-string parameters

Your function is always going to deliver 0 because the $ infix function uses non-standard evaluation of its right-hand side argument. (As you point out, non-standard evaluation is a favorite mechanism in @hadley's functions. For me it's a barrier, but for many people it seems to be a welcome strategy.) If you write your function in that manner (using $) you will generally fail to get what you want:

 mysum(dat, blue, red)
[1] 0 # Wrong answer

You said earlier that: "However, mycol1, mycol2, and mycol3 are not strings but just text in R." I guess you are trying to say that mycol is not enclosed in quotes and so is not a character literal. In R such "text" (a sequence of unquoted characters) is called a 'symbol' or a 'name'. (Up to this point we are not talking about anything to do with dplyr.) If you want to write a function that will deliver that sum, you would do so like this (avoiding the $ operation):

mysum <- function(dat, x, y){
return (sum(dat[[x]])+ sum(dat[[y]]))
}

mysum(dat, 'blue', 'red')
[1] 19.16727

If you want to retrieve the argument name for a matched parameter you need to use the deparse( substitute(.))-maneuver:

 dat <- data.frame(blue = rnorm(10), red= rnorm(10))

mysum2 <- function(dfrm, arg1, arg2){
a1 <- deparse(substitute(arg1)); a2 <- deparse(substitute(arg2))
sum(dfrm[[a1]]) +sum(dfrm[[a2]]) }
mysum2(dat, blue, red)
#[1] -0.5754979
mysum(dat, "blue", "red")
#[1] -0.5754979

If you want to see how @hadley does, then it just type:

> dplyr::select
function (.data, ...)
{
select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>

.... doesn't really deliver the answer, does it? So we will need to try this:

 help(pac=lazyeval)

... which has an accompanying vignette named "lazyeval::lazyeval" --> "Lazyeval: a new approach to NSE". Hadley argues that his lazyeval functions are superior to the traditional substitute because they carry forward their environments, and suppose I do agree.



Related Topics



Leave a reply



Submit