How to pass strings denoting expressions to dplyr 0.7 verbs?
It's important to note that, in this simple example, we have control of how the expressions are created. So the best way to pass the expressions is to construct and pass quosures directly using quos()
:
library(tidyverse)
library(rlang)
group_by_and_tally <- function(data, groups) {
data %>%
group_by(UQS(groups)) %>%
tally()
}
my_groups <- quos(2 * cyl, am)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2
However, if we receive the expressions from an outside source in the form of strings, we can simply parse the expressions first, which converts them to quosures:
my_groups <- c('2 * cyl', 'am')
my_groups <- my_groups %>% map(parse_quosure)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2
Again, we should only do this if we are getting expressions from an outside source that provides them as strings - otherwise we should make quosures directly in the R source code.
How to parametrize function calls in dplyr 0.7?
dplyr
will have a specialized group_by function group_by_at
to deal with multiple grouping variables. It would be much easier to use the new member of the _at
family:
# using the pre-release 0.6.0
cols <- c("am","gear")
mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))
# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000
The .vars
argument accepts both character/numeric vector or column names generated by vars
:
.vars
A list of columns generated by vars(), or a character vector of
column names, or a numeric vector of column positions.
How to write use quos in a formula in R?
Try to use sym
and evaluate it with !!
. I would also pass additional data argument to the function.
library(dplyr)
library(rlang)
f <- function(data, x, y, new) {
data %>% mutate (!!new := !!sym(x) * !!sym(y))
}
A %>% f("x", "y", "new")
# x y new
#1 1 11 11
#2 2 12 24
#3 3 13 39
#4 4 14 56
#5 5 15 75
identical(A %>% f("x", "y", "new"), A %>% mutate (new = x * y))
#[1] TRUE
Using unquoted strings in with `dplyr` verbs: `select` and `arrange` working differently
You need to convert your string to variable first (using sym()
) then unquote it inside arrange()
.
df %>% arrange(!!sym(v))
#> # A tibble: 3 x 2
#> x y
#> <dbl> <dbl>
#> 1 3 3
#> 2 2 6
#> 3 1 8
select()
can directly take string input but it's not recommended
df %>% select(v)
#> Note: Using an external vector in selections is ambiguous.
#> i Use `all_of(v)` instead of `v` to silence this message.
#> i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> # A tibble: 3 x 1
#> y
#> <dbl>
#> 1 8
#> 2 6
#> 3 3
Created on 2020-11-21 by the reprex package (v0.3.0)
Convert strings to call dplyr functions
You can use eval
with parse_expr
library(dplyr)
library(rlang)
filter_and_compute <- function(df,
condition,
col_to_modify,
calculus) {
df %>%
filter(eval(parse_expr(condition))) %>%
mutate(!! col_to_modify := eval(parse_expr(calculus))) %>%
rbind(df %>% filter(! eval(parse_expr(condition))))
}
filter_and_compute(mtcars, "cyl == 4", "mpg", "2 * hp")
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 186.0 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 124.0 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 190.0 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#4 132.0 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#5 104.0 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#6 130.0 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#7 194.0 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#8 132.0 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#9 182.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#10 226.0 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#11 218.0 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#12 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#....
dplyr 0.7 equivalent for deprecated mutate_
To expand a little bit on MrFlick's example, let's assume you have a number of instructions stored as strings, as well as the corresponding names that you want to assign to the resulting computations:
ln <- list( "test2", "test3" )
lf <- list( "substr(test, 1, 5)", "substr(test, 5, 5)" )
Match up names to their instructions and convert everything to quosures:
ll <- setNames( lf, ln ) %>% lapply( rlang::parse_quosure )
As per aosmith's suggestion, the entire list can now be passed to mutate, using the special !!!
operator:
tibble( test = "test@test" ) %>% mutate( !!! ll )
# # A tibble: 1 x 3
# test test2 test3
# <chr> <chr> <chr>
# 1 test@test test@ @
Pass column names as strings to group_by and summarize
For this you can now use _at
versions of the verbs
df %>%
group_by_at(cols2group) %>%
summarize_at(.vars = col2summarize, .funs = min)
Edit (2021-06-09):
Please see Ronak Shah's answer, using
mutate(across(all_of(cols2summarize), min))
Now the preferred option
How does dplyr pass in non-string parameters
Your function is always going to deliver 0
because the $
infix function uses non-standard evaluation of its right-hand side argument. (As you point out, non-standard evaluation is a favorite mechanism in @hadley's functions. For me it's a barrier, but for many people it seems to be a welcome strategy.) If you write your function in that manner (using $
) you will generally fail to get what you want:
mysum(dat, blue, red)
[1] 0 # Wrong answer
You said earlier that: "However, mycol1, mycol2, and mycol3 are not strings but just text in R." I guess you are trying to say that mycol
is not enclosed in quotes and so is not a character literal. In R such "text" (a sequence of unquoted characters) is called a 'symbol' or a 'name'. (Up to this point we are not talking about anything to do with dplyr.) If you want to write a function that will deliver that sum, you would do so like this (avoiding the $
operation):
mysum <- function(dat, x, y){
return (sum(dat[[x]])+ sum(dat[[y]]))
}
mysum(dat, 'blue', 'red')
[1] 19.16727
If you want to retrieve the argument name for a matched parameter you need to use the deparse( substitute(.))
-maneuver:
dat <- data.frame(blue = rnorm(10), red= rnorm(10))
mysum2 <- function(dfrm, arg1, arg2){
a1 <- deparse(substitute(arg1)); a2 <- deparse(substitute(arg2))
sum(dfrm[[a1]]) +sum(dfrm[[a2]]) }
mysum2(dat, blue, red)
#[1] -0.5754979
mysum(dat, "blue", "red")
#[1] -0.5754979
If you want to see how @hadley does, then it just type:
> dplyr::select
function (.data, ...)
{
select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>
.... doesn't really deliver the answer, does it? So we will need to try this:
help(pac=lazyeval)
... which has an accompanying vignette named "lazyeval::lazyeval" --> "Lazyeval: a new approach to NSE". Hadley argues that his lazyeval
functions are superior to the traditional substitute
because they carry forward their environments, and suppose I do agree.
Related Topics
Plot Random Effects from Lmer (Lme4 Package) Using Qqmath or Dotplot: How to Make It Look Fancy
Dplyr: Put Count Occurrences into New Variable
Can't Load X11 in R After Os X Yosemite Upgrade
What Does the Diff() Function in R Do
Visualizing R Function Dependencies
Random Forest with Classes That Are Very Unbalanced
Categorical Bubble Plot for Mapping Studies
Using Lapply with Changing Arguments
How to Change Font Family in a Legend in an R-Plot
Extracting Value Based on Another Column
Calculating Standard Deviation of Each Row
Export Each Data Frame Within a List to CSV
R: Generate All Permutations of Vector Without Duplicated Elements