Tidyeval with List of Column Names in a Function

Tidyeval with list of column names in a function

You could pass your list of arguments using alist instead of list, as it won't evaluate the arguments.

my_summarise = function(df, group_var, sum_var) {
group_var = quos(!!! group_var)
sum_var = enquo(sum_var)

df %>%
group_by(!!! group_var) %>%
summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}

my_summarise(df, alist(g1, g2), b)

# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 b
<dbl> <dbl> <dbl>
1 1 1 2.0
2 1 2 3.0
3 2 1 4.5
4 2 2 1.0

Another alternative would be to pass that argument directly with quos instead of list as shown in this answer, which bypasses some complications all together.

my_summarise = function(df, group_var, sum_var) {
# group_var = quos(!!! group_var)
sum_var = enquo(sum_var)

df %>%
group_by(!!! group_var) %>%
summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}

my_summarise(df, quos(g1, g2), b)

# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 b
<dbl> <dbl> <dbl>
1 1 1 2.0
2 1 2 3.0
3 2 1 4.5
4 2 2 1.0

tidy eval map over column names

Instead of using enquo change it to either .data or convert to symbols with ensym and evaluate (!!)

emp_term_var <- function(data, colName, year = "2015") {

# Terminations by year and variable in df
colName <- ensym(colName)
term_test <- data %>%
filter(year(DateofTermination) == year) %>%
#group_by(!!colName)) %>%
count(!!(colName)) %>%
clean_names()
return(term_test)

}

NOTE: count can take the column without any grouping as well

The advantage with ensym route is that it can both quoted and unquoted input i.e. it takes the string as column names and without the quotes

nm1 <- c("Department", "State")
purrr::map(nm1, ~ emp_term_var(df, colName = !!.x, year = "2015"))

or if we want to use

emp_term_var(data = df, colName = Department, year = "2015")

Or can take

emp_term_var(data = df, colName = "Department", year = "2015")

Using character object to indicate column name within R's glue function?

There are multiple ways you can do this :

  1. With .data :
library(dplyr)
library(glue)
data <- mtcars %>% as_tibble(rownames = "Vehicle")

column_of_interest <- "mpg"

data %>%
mutate(Label=glue("{Vehicle}: {value}",value=.data[[column_of_interest]])) %>%
select(Label)

# Label
# <glue>
# 1 Mazda RX4: 21
# 2 Mazda RX4 Wag: 21
# 3 Datsun 710: 22.8
# 4 Hornet 4 Drive: 21.4
# 5 Hornet Sportabout: 18.7
# 6 Valiant: 18.1
# 7 Duster 360: 14.3
# 8 Merc 240D: 24.4
# 9 Merc 230: 22.8
#10 Merc 280: 19.2
# … with 22 more rows

  1. With get :
data %>% 
mutate(Label=glue("{Vehicle}: {value}",value= get(column_of_interest))) %>%
select(Label)

  1. Use sym with !! :
data %>% 
mutate(Label=glue("{Vehicle}: {value}",value= !!sym(column_of_interest))) %>%
select(Label)

How to use tidy evaluation with column name as strings?

We can use also ensym with !!

my_summarise <- function(df, group_var) {

df %>%
group_by(!!rlang::ensym(group_var)) %>%
summarise(a = mean(a))
}

my_summarise(df, 'g1')

Or another option is group_by_at

my_summarise <- function(df, group_var) {

df %>%
group_by_at(vars(group_var)) %>%
summarise(a = mean(a))
}

my_summarise(df, 'g1')

Tidyeval: pass list of columns as quosure to select()

This is a bit tricky because of the mix of semantics involved in this problem. pmap() takes a list and passes each element as its own argument to a function (it's kind of equivalent to !!! in that sense). Your quoting function thus needs to quote its arguments and somehow pass a list of columns to pmap().

Our quoting function can go one of two ways. Either quote (i.e., delay) the list creation, or create an actual list of quoted expressions right away:

quoting_fn1 <- function(...) {
exprs <- enquos(...)

# For illustration purposes, return the quoted inputs instead of
# doing something with them. Normally you'd call `mutate()` here:
exprs
}

quoting_fn2 <- function(...) {
expr <- quo(list(!!!enquos(...)))

expr
}

Since our first variant does nothing but return a list of quoted inputs, it's actually equivalent to quos():

quoting_fn1(a, b)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^a
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^b
#> env: global

The second version returns a quoted expression that instructs R to create a list with quoted inputs:

quoting_fn2(a, b)
#> <quosure>
#> expr: ^list(^a, ^b)
#> env: 0x7fdb69d9bd20

There is a subtle but important difference between the two. The first version creates an actual list object:

exprs <- quoting_fn1(a, b)
typeof(exprs)
#> [1] "list"

On the other hand, the second version does not return a list, it returns an expression for creating a list:

expr <- quoting_fn2(a, b)
typeof(expr)
#> [1] "language"

Let's find out which version is more appropriate for interfacing with pmap(). But first we'll give a name to the pmapped function to make the code clearer and easier to experiment with:

myfunction <- function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
args[[1]] + args[[2]]
}

Understanding how tidy eval works is hard in part because we usually don't get to observe the unquoting step. We'll use rlang::qq_show() to reveal the result of unquoting expr (the delayed list) and exprs (the actual list) with !!:

rlang::qq_show(
mutate(df, outcome = pmap_int(!!expr, myfunction))
)
#> mutate(df, outcome = pmap_int(^list(^a, ^b), myfunction))

rlang::qq_show(
mutate(df, outcome = pmap_int(!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))

When we unquote the delayed list, mutate() calls pmap_int() with list(a, b), evaluated in the data frame, which is exactly what we need:

mutate(df, outcome = pmap_int(!!expr, myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106

On the other hand, if we unquote an actual list of quoted expressions, we get an error:

mutate(df, outcome = pmap_int(!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: Element 1 is not a vector (language).

That's because the quoted expressions inside the list are not evaluated in the data frame. In fact, they are not evaluated at all. pmap() gets the quoted expressions as is, which it doesn't understand. Recall what qq_show() has shown us:

#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))

Anything inside angular brackets is passed as is. This is a sign that we should somehow have used !!! instead, to inline each element of the list of quosures in the surrounding expression. Let's try it:

rlang::qq_show(
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(^a, ^b, myfunction))

Hmm... Doesn't look right. We're supposed to pass a list to pmap_int(), and here it gets each quoted input as separate argument. Indeed we get a type error:

mutate(df, outcome = pmap_int(!!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: `.x` is not a list (integer).

That's easy to fix, just splice into a call to list():

rlang::qq_show(
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
)
#> mutate(df, outcome = pmap_int(list(^a, ^b), myfunction))

And voilà!

mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106

Best tidyverse practice for passing column names as variables in function

Since you are already using the curly-curly {{ operator you can implement that further in your function to have quoted arguments:

myCalc <- function(data, dateIn, numIn, yearOut, numOut) {
data <- data %>%
mutate(.
, {{yearOut}} := lubridate::year({{ dateIn }})
, {{numOut}} := 10 * {{ numIn }}
) %>%
filter(.
, {{ numOut }} > 250
)

return(data)
}

Your use of strings does work (e.g. .data[[dateIn]], evaluates to .data[["a"]] in your example). As mentioned in the comments by @r2evans the difference really comes during the function call.

This function would be called like so (note the lack of quotes in the arguments):

dat2 <- myCalc(dat0, 
dateIn = a,
numIn = b,
yearOut = c,
numOut = d)

You can read more about this with ?rlang::`nse-defuse` and ?rlang::`nse-force` . There is also this tidyverse article with more on the subject.

Passing a column name for a new column in a function without tidyeval?

In base R, we can use deparse/substitute

new_col <- function(df, col_name, col_vals){
cn <- deparse(substitute(col_name))
df[[cn]] <- col_vals
df
}

-testing

 sleep %>% 
+ new_col(sample, "sample1") |>
+ new_col(condition, "condition2") |>
+ head()
extra group ID sample condition
1 0.7 1 1 sample1 condition2
2 -1.6 1 2 sample1 condition2
3 -0.2 1 3 sample1 condition2
4 -1.2 1 4 sample1 condition2
5 -0.1 1 5 sample1 condition2
6 3.4 1 6 sample1 condition2

How to use quasiquotation / tidy evaluation when doing `map` with column names

I think this question can be decomposed into a section on quasi-quotation and anoth on map functions.

First, ~ starwars %>% count(.x)) is shorthand for and a slightly more complicated version of function(.x){starwars %>% count(.x)}. So I'm going to work with the functions directly.

Second, names(starwars) gives you a character vector.

So to avoid the confusion that map brings let's start with functions and pass them the character "eye_color".

Attempt 1: dplyr functions treat symbols as if they are columns in the tbl

dplyr functions are nice when doing interactive data analysis, because they allow us to refer to columns with symbols. I recommend reading:
https://dplyr.tidyverse.org/articles/programming.html for more info.

func <- function(.x) { starwars %>% count(.x) }
func("eye_color")
Error: Column `.x` is unknown

In your first attempt, this leads to a problem, because .x is symbol, so R thinks .x is column in starwars.

Attempt 2/3: count() / group_by() expect symbols not character input.

!! takes .x and replaces it with "eye_color". But "eye_color" is not symbol/name but rather a character.

func_2 <- function(.x) { starwars %>% count(!!.x) }
func_2("eye_color")

# A tibble: 1 x 2
`"eye_color"` n
<chr> <int>
1 eye_color 87

This weird output is the result of grouping by a character. For whatever reason, dplyr groups the whole dataframe as "eye_color" and then tells you there are 87 rows. starwars %>% count("hooray") gives similar output.

Interlude: what we want is a symbol

A somewhat intuitive way to code dplyr functions is to pass symbols/names and use {{.x}} to evaluate the promise. (Less intuitively you can do !!enquo(.x).)

func_3 <- function(.x) {  starwars %>% count({{.x}}) }
func_3(eye_color)

# A tibble: 15 x 2
eye_color n
<chr> <int>
1 black 10
2 blue 19
3 ...

This works!

A solution is to convert the characters to symbols

func_4 <- function(.x) { .x = as.symbol(.x)
starwars %>% count({{.x}}) }
func_4("eye_color")

# A tibble: 15 x 2
eye_color n
<chr> <int>
1 black 10
2 blue 19
3 ...

This also works!

Bringing back map

Before I continue, I think nniloc's solution is better for your problem.

But you could use map as follows

starwars %>% 
select_if(negate(is.list)) %>%
names() %>%
map(function(.x) {x = as.symbol(.x)
starwars %>% count( {{ x }} )
})

or

starwars %>% 
select_if(negate(is.list)) %>%
names() %>%
map(as.symbol) %>%
map(function(.x) {
starwars %>% count( {{ .x }} )
})

When you use the ~ notation, .x is now a "pronoun" that refers to the symbols directly, so we can use !! to access the symbols directly. (I don't fully understand this).

starwars %>% 
select_if(negate(is.list)) %>%
names() %>%
map(as.symbol) %>%
map(~ starwars %>% count( !! .x ))

Regarding imap(), it looks like you want to code in python (or some other language with iteration). imap() is short hand for map2(.x, names(.x), ...) so is distinct from enumerate() in python. There are R functions like seq_along which give you position in an object, but I haven't used those with map.



Related Topics



Leave a reply



Submit