standard evaluation in dplyr: summarise a variable given as a character string
dplyr
1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr
programming vignette here:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
The new way to refer to columns when their identifier is stored as a character vector is to use the .data
pronoun from rlang
, and then subset as you would in base R.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19
If your code is in a package function, you can @importFrom rlang .data
to avoid R check notes about undefined globals.
How do I do Standard evaluation with dplyr's arrange?
I don't know why !!
does not work with arrange but you can still use get
a %>% arrange(get(meh))
# date ok
#1 1 1
#2 2 2
#3 3 3
dplyr standard evaluation: summarise_ with variable name for summed variable
After poring over the NSE vignette for awhile and poking at things, I found you can use setNames
within summarise_
if you use the .dots
argument and put the interp
work in a list.
a %>%
filter_(~x == tag) %>%
group_by_(tag) %>%
summarise_(.dots = setNames(list(interp(~sum(var, na.rm = TRUE),
var = as.name(paste0(metric,"_",run1)))),
paste0(metric,"_",run1)))
Source: local data frame [1 x 2]
2011 y_zm
1 2011 30
You could also add a rename_
step to do the same thing. I could see this being less ideal, as it relies on knowing the name you used in summarise_
. But if you always use the same name, like variable_name
, this does seem like a viable alternative for some situations.
a %>%
filter_(~x == tag) %>%
group_by_(tag) %>%
summarise_(variable_name = interp(~sum(var, na.rm = T),
var = as.name(paste0(metric,"_",run1)))) %>%
rename_(.dots = setNames("variable_name", paste0(metric,"_",run1)))
Source: local data frame [1 x 2]
2011 y_zm
1 2011 30
Use I string to refer to a variable inside dplyr?
As it is a string, convert it to symbol (sym
from rlang
) and evaluate (!!
)
test_df %>%
summarise(y = mean(!! rlang::sym(string_outcome)))
Or use summarise_at
which can take strings in vars
parameter
test_df %>%
summarise_at(vars(string_outcome), list(y = ~ mean(.)))
Or if we need a single value without any attributes, even pull
with mean
can be used
test_df %>%
pull(string_outcome) %>%
mean
standard eval with `dplyr::count()`
To create a list of symbols from strings, you want rlang::syms
(not rlang::sym
). For unquoting a list or a vector, you want to use !!!
(not !!
). The following will work:
library(magrittr)
variables <- c("cyl", "vs")
vars_sym <- rlang::syms(variables)
vars_sym
#> [[1]]
#> cyl
#>
#> [[2]]
#> vs
mtcars %>%
dplyr::count(!!! vars_sym)
#> # A tibble: 5 x 3
#> cyl vs n
#> <dbl> <dbl> <int>
#> 1 4 0 1
#> 2 4 1 10
#> 3 6 0 3
#> 4 6 1 4
#> 5 8 0 14
using variable column names in dplyr summarise
You can use base::get
:
df %>% summarise(mean(get(x[1])) - mean(get(x[2])))
# # A tibble: 2 x 2
# c `mean(a) - mean(b)`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1
get
will search in current environment by default.
As the error message says, mean
expects a logical or numeric object, as.name
returns a name:
class(as.name("a")) # [1] "name"
You could evaluate your name, that would work as well :
df %>% summarise(mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2]))))
# # A tibble: 2 x 2
# c `mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2])))`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1
Using strings as arguments in custom dplyr function using non-standard evaluation
You can either use sym
to turn "y" into a symbol or parse_expr
to parse it into an expression, then unquote it using !!
:
library(rlang)
testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!sym(myVar))
testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!parse_expr(myVar))
Result:
x y
1 a 0
2 b 100
3 c 200
Check my answer in this question for explanation of difference between sym
and parse_expr
.
Pass column names as strings to group_by and summarize
For this you can now use _at
versions of the verbs
df %>%
group_by_at(cols2group) %>%
summarize_at(.vars = col2summarize, .funs = min)
Edit (2021-06-09):
Please see Ronak Shah's answer, using
mutate(across(all_of(cols2summarize), min))
Now the preferred option
Conditional Evaluation in Dplyr
As there are multiple statements, wrap it inside a {}
r <- c()
iris %>%
{if(length(r) > 0) {
mutate(., Test = 1)
} else .}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
...
-testing with r
length > 0
r <- 5
iris %>%
{if(length(r) > 0) {
mutate(., Test = 1)
} else .}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Test
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
...
However, this can be easily modified without a loop i.e. convert the logical vector to numeric index by adding 1 (as indexing in R
starts from 1). Use that to select a list
with values 1 and NULL. If the length is 0, then NULL is selected and thus no column is created
iris %>%
mutate(Test = list(NULL, 1)[[1 + (length(r) > 0)]])
Related Topics
What Are the "Standard Unambiguous Date" Formats For String-To-Date Conversion in R
Select Equivalent Rows [A-B & B-A]
Efficient Way to Rbind Data.Frames With Different Columns
Splitting a Continuous Variable into Equal Sized Groups
How to Create an R Function Programmatically
What Is Meaning of First Tilde in Purrr::Map
Count Nas Per Row in Dataframe
Do.Call(Rbind, List) For Uneven Number of Column
R Stacked Percentage Bar Plot With Percentage of Binary Factor and Labels (With Ggplot)
Forcing Garbage Collection to Run in R With the Gc() Command
Plotting Contours on an Irregular Grid
Table of Interactions - Case With Pets and Houses
Shifting Non-Na Cells to the Left
Basic Lag in R Vector/Dataframe
Count the Number of All Words in a String
Dplyr: Inner_Join With a Partial String Match
Alternate, Interweave or Interlace Two Vectors
Create Discrete Color Bar With Varying Interval Widths and No Spacing Between Legend Levels