Standard Evaluation in Dplyr: Summarise a Variable Given as a Character String

standard evaluation in dplyr: summarise a variable given as a character string

dplyr 1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr programming vignette here:

https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

The new way to refer to columns when their identifier is stored as a character vector is to use the .data pronoun from rlang, and then subset as you would in base R.

library(dplyr)

key <- "v3"
val <- "v2"
drp <- "v1"

df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))

df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19

If your code is in a package function, you can @importFrom rlang .data to avoid R check notes about undefined globals.

How do I do Standard evaluation with dplyr's arrange?

I don't know why !! does not work with arrange but you can still use get

a %>% arrange(get(meh))


# date ok
#1 1 1
#2 2 2
#3 3 3

dplyr standard evaluation: summarise_ with variable name for summed variable

After poring over the NSE vignette for awhile and poking at things, I found you can use setNames within summarise_ if you use the .dots argument and put the interp work in a list.

a %>%
filter_(~x == tag) %>%
group_by_(tag) %>%
summarise_(.dots = setNames(list(interp(~sum(var, na.rm = TRUE),
var = as.name(paste0(metric,"_",run1)))),
paste0(metric,"_",run1)))

Source: local data frame [1 x 2]

2011 y_zm
1 2011 30

You could also add a rename_ step to do the same thing. I could see this being less ideal, as it relies on knowing the name you used in summarise_. But if you always use the same name, like variable_name, this does seem like a viable alternative for some situations.

a %>%
filter_(~x == tag) %>%
group_by_(tag) %>%
summarise_(variable_name = interp(~sum(var, na.rm = T),
var = as.name(paste0(metric,"_",run1)))) %>%
rename_(.dots = setNames("variable_name", paste0(metric,"_",run1)))

Source: local data frame [1 x 2]

2011 y_zm
1 2011 30

Use I string to refer to a variable inside dplyr?

As it is a string, convert it to symbol (sym from rlang) and evaluate (!!)

test_df %>%
summarise(y = mean(!! rlang::sym(string_outcome)))

Or use summarise_at which can take strings in vars parameter

test_df %>%
summarise_at(vars(string_outcome), list(y = ~ mean(.)))

Or if we need a single value without any attributes, even pull with mean can be used

test_df %>% 
pull(string_outcome) %>%
mean

standard eval with `dplyr::count()`

To create a list of symbols from strings, you want rlang::syms (not rlang::sym). For unquoting a list or a vector, you want to use !!! (not !!). The following will work:

library(magrittr)

variables <- c("cyl", "vs")

vars_sym <- rlang::syms(variables)
vars_sym
#> [[1]]
#> cyl
#>
#> [[2]]
#> vs

mtcars %>%
dplyr::count(!!! vars_sym)
#> # A tibble: 5 x 3
#> cyl vs n
#> <dbl> <dbl> <int>
#> 1 4 0 1
#> 2 4 1 10
#> 3 6 0 3
#> 4 6 1 4
#> 5 8 0 14

using variable column names in dplyr summarise

You can use base::get:

df %>% summarise(mean(get(x[1])) - mean(get(x[2])))

# # A tibble: 2 x 2
# c `mean(a) - mean(b)`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1

get will search in current environment by default.

As the error message says, mean expects a logical or numeric object, as.name returns a name:

class(as.name("a")) # [1] "name"

You could evaluate your name, that would work as well :

df %>% summarise(mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2]))))
# # A tibble: 2 x 2
# c `mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2])))`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1

Using strings as arguments in custom dplyr function using non-standard evaluation

You can either use sym to turn "y" into a symbol or parse_expr to parse it into an expression, then unquote it using !!:

library(rlang)

testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!sym(myVar))

testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!parse_expr(myVar))

Result:

  x   y
1 a 0
2 b 100
3 c 200

Check my answer in this question for explanation of difference between sym and parse_expr.

Pass column names as strings to group_by and summarize

For this you can now use _at versions of the verbs

df %>%  
group_by_at(cols2group) %>%
summarize_at(.vars = col2summarize, .funs = min)

Edit (2021-06-09):

Please see Ronak Shah's answer, using

mutate(across(all_of(cols2summarize), min))

Now the preferred option

Conditional Evaluation in Dplyr

As there are multiple statements, wrap it inside a {}

r <- c()
iris %>%
{if(length(r) > 0) {
mutate(., Test = 1)
} else .}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
...

-testing with r length > 0

r <- 5
iris %>%
{if(length(r) > 0) {
mutate(., Test = 1)
} else .}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Test
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
...

However, this can be easily modified without a loop i.e. convert the logical vector to numeric index by adding 1 (as indexing in R starts from 1). Use that to select a list with values 1 and NULL. If the length is 0, then NULL is selected and thus no column is created

iris %>%
mutate(Test = list(NULL, 1)[[1 + (length(r) > 0)]])


Related Topics



Leave a reply



Submit