Tidyeval with list of column names in a function
You could pass your list of arguments using alist
instead of list
, as it won't evaluate the arguments.
my_summarise = function(df, group_var, sum_var) {
group_var = quos(!!! group_var)
sum_var = enquo(sum_var)
df %>%
group_by(!!! group_var) %>%
summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}
my_summarise(df, alist(g1, g2), b)
# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 b
<dbl> <dbl> <dbl>
1 1 1 2.0
2 1 2 3.0
3 2 1 4.5
4 2 2 1.0
Another alternative would be to pass that argument directly with quos
instead of list
as shown in this answer, which bypasses some complications all together.
my_summarise = function(df, group_var, sum_var) {
# group_var = quos(!!! group_var)
sum_var = enquo(sum_var)
df %>%
group_by(!!! group_var) %>%
summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}
my_summarise(df, quos(g1, g2), b)
# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 b
<dbl> <dbl> <dbl>
1 1 1 2.0
2 1 2 3.0
3 2 1 4.5
4 2 2 1.0
tidy eval map over column names
Instead of using enquo
change it to either .data
or convert to symbols with ensym
and evaluate (!!
)
emp_term_var <- function(data, colName, year = "2015") {
# Terminations by year and variable in df
colName <- ensym(colName)
term_test <- data %>%
filter(year(DateofTermination) == year) %>%
#group_by(!!colName)) %>%
count(!!(colName)) %>%
clean_names()
return(term_test)
}
NOTE: count
can take the column without any grouping as well
The advantage with ensym
route is that it can both quoted and unquoted input i.e. it takes the string as column names and without the quotes
nm1 <- c("Department", "State")
purrr::map(nm1, ~ emp_term_var(df, colName = !!.x, year = "2015"))
or if we want to use
emp_term_var(data = df, colName = Department, year = "2015")
Or can take
emp_term_var(data = df, colName = "Department", year = "2015")
Using character object to indicate column name within R's glue function?
There are multiple ways you can do this :
- With
.data
:
library(dplyr)
library(glue)
data <- mtcars %>% as_tibble(rownames = "Vehicle")
column_of_interest <- "mpg"
data %>%
mutate(Label=glue("{Vehicle}: {value}",value=.data[[column_of_interest]])) %>%
select(Label)
# Label
# <glue>
# 1 Mazda RX4: 21
# 2 Mazda RX4 Wag: 21
# 3 Datsun 710: 22.8
# 4 Hornet 4 Drive: 21.4
# 5 Hornet Sportabout: 18.7
# 6 Valiant: 18.1
# 7 Duster 360: 14.3
# 8 Merc 240D: 24.4
# 9 Merc 230: 22.8
#10 Merc 280: 19.2
# … with 22 more rows
- With
get
:
data %>%
mutate(Label=glue("{Vehicle}: {value}",value= get(column_of_interest))) %>%
select(Label)
- Use
sym
with!!
:
data %>%
mutate(Label=glue("{Vehicle}: {value}",value= !!sym(column_of_interest))) %>%
select(Label)
How to use tidy evaluation with column name as strings?
We can use also ensym
with !!
my_summarise <- function(df, group_var) {
df %>%
group_by(!!rlang::ensym(group_var)) %>%
summarise(a = mean(a))
}
my_summarise(df, 'g1')
Or another option is group_by_at
my_summarise <- function(df, group_var) {
df %>%
group_by_at(vars(group_var)) %>%
summarise(a = mean(a))
}
my_summarise(df, 'g1')
Tidyeval: pass list of columns as quosure to select()
This is a bit tricky because of the mix of semantics involved in this problem. pmap()
takes a list and passes each element as its own argument to a function (it's kind of equivalent to !!!
in that sense). Your quoting function thus needs to quote its arguments and somehow pass a list of columns to pmap()
.
Our quoting function can go one of two ways. Either quote (i.e., delay) the list creation, or create an actual list of quoted expressions right away:
quoting_fn1 <- function(...) {
exprs <- enquos(...)
# For illustration purposes, return the quoted inputs instead of
# doing something with them. Normally you'd call `mutate()` here:
exprs
}
quoting_fn2 <- function(...) {
expr <- quo(list(!!!enquos(...)))
expr
}
Since our first variant does nothing but return a list of quoted inputs, it's actually equivalent to quos()
:
quoting_fn1(a, b)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^a
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^b
#> env: global
The second version returns a quoted expression that instructs R to create a list with quoted inputs:
quoting_fn2(a, b)
#> <quosure>
#> expr: ^list(^a, ^b)
#> env: 0x7fdb69d9bd20
There is a subtle but important difference between the two. The first version creates an actual list object:
exprs <- quoting_fn1(a, b)
typeof(exprs)
#> [1] "list"
On the other hand, the second version does not return a list, it returns an expression for creating a list:
expr <- quoting_fn2(a, b)
typeof(expr)
#> [1] "language"
Let's find out which version is more appropriate for interfacing with pmap()
. But first we'll give a name to the pmapped function to make the code clearer and easier to experiment with:
myfunction <- function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
args[[1]] + args[[2]]
}
Understanding how tidy eval works is hard in part because we usually don't get to observe the unquoting step. We'll use rlang::qq_show()
to reveal the result of unquoting expr
(the delayed list) and exprs
(the actual list) with !!
:
rlang::qq_show(
mutate(df, outcome = pmap_int(!!expr, myfunction))
)
#> mutate(df, outcome = pmap_int(^list(^a, ^b), myfunction))
rlang::qq_show(
mutate(df, outcome = pmap_int(!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))
When we unquote the delayed list, mutate()
calls pmap_int()
with list(a, b)
, evaluated in the data frame, which is exactly what we need:
mutate(df, outcome = pmap_int(!!expr, myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
On the other hand, if we unquote an actual list of quoted expressions, we get an error:
mutate(df, outcome = pmap_int(!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: Element 1 is not a vector (language).
That's because the quoted expressions inside the list are not evaluated in the data frame. In fact, they are not evaluated at all. pmap()
gets the quoted expressions as is, which it doesn't understand. Recall what qq_show()
has shown us:
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))
Anything inside angular brackets is passed as is. This is a sign that we should somehow have used !!!
instead, to inline each element of the list of quosures in the surrounding expression. Let's try it:
rlang::qq_show(
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(^a, ^b, myfunction))
Hmm... Doesn't look right. We're supposed to pass a list to pmap_int()
, and here it gets each quoted input as separate argument. Indeed we get a type error:
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: `.x` is not a list (integer).
That's easy to fix, just splice into a call to list()
:
rlang::qq_show(
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
)
#> mutate(df, outcome = pmap_int(list(^a, ^b), myfunction))
And voilà!
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
Best tidyverse practice for passing column names as variables in function
Since you are already using the curly-curly {{
operator you can implement that further in your function to have quoted arguments:
myCalc <- function(data, dateIn, numIn, yearOut, numOut) {
data <- data %>%
mutate(.
, {{yearOut}} := lubridate::year({{ dateIn }})
, {{numOut}} := 10 * {{ numIn }}
) %>%
filter(.
, {{ numOut }} > 250
)
return(data)
}
Your use of strings does work (e.g. .data[[dateIn]]
, evaluates to .data[["a"]]
in your example). As mentioned in the comments by @r2evans the difference really comes during the function call.
This function would be called like so (note the lack of quotes in the arguments):
dat2 <- myCalc(dat0,
dateIn = a,
numIn = b,
yearOut = c,
numOut = d)
You can read more about this with ?rlang::`nse-defuse`
and ?rlang::`nse-force`
. There is also this tidyverse article with more on the subject.
Passing a column name for a new column in a function without tidyeval?
In base R
, we can use deparse/substitute
new_col <- function(df, col_name, col_vals){
cn <- deparse(substitute(col_name))
df[[cn]] <- col_vals
df
}
-testing
sleep %>%
+ new_col(sample, "sample1") |>
+ new_col(condition, "condition2") |>
+ head()
extra group ID sample condition
1 0.7 1 1 sample1 condition2
2 -1.6 1 2 sample1 condition2
3 -0.2 1 3 sample1 condition2
4 -1.2 1 4 sample1 condition2
5 -0.1 1 5 sample1 condition2
6 3.4 1 6 sample1 condition2
How to use quasiquotation / tidy evaluation when doing `map` with column names
I think this question can be decomposed into a section on quasi-quotation and anoth on map
functions.
First, ~ starwars %>% count(.x))
is shorthand for and a slightly more complicated version of function(.x){starwars %>% count(.x)}
. So I'm going to work with the functions directly.
Second, names(starwars)
gives you a character vector.
So to avoid the confusion that map
brings let's start with functions and pass them the character "eye_color".
Attempt 1: dplyr
functions treat symbols as if they are columns in the tbl
dplyr
functions are nice when doing interactive data analysis, because they allow us to refer to columns with symbols. I recommend reading:
https://dplyr.tidyverse.org/articles/programming.html for more info.
func <- function(.x) { starwars %>% count(.x) }
func("eye_color")
Error: Column `.x` is unknown
In your first attempt, this leads to a problem, because .x
is symbol, so R thinks .x
is column in starwars
.
Attempt 2/3: count()
/ group_by()
expect symbols not character input.
!!
takes .x
and replaces it with "eye_color". But "eye_color" is not symbol/name but rather a character.
func_2 <- function(.x) { starwars %>% count(!!.x) }
func_2("eye_color")
# A tibble: 1 x 2
`"eye_color"` n
<chr> <int>
1 eye_color 87
This weird output is the result of grouping by a character. For whatever reason, dplyr
groups the whole dataframe as "eye_color" and then tells you there are 87 rows. starwars %>% count("hooray")
gives similar output.
Interlude: what we want is a symbol
A somewhat intuitive way to code dplyr
functions is to pass symbols/names and use {{.x}}
to evaluate the promise. (Less intuitively you can do !!enquo(.x)
.)
func_3 <- function(.x) { starwars %>% count({{.x}}) }
func_3(eye_color)
# A tibble: 15 x 2
eye_color n
<chr> <int>
1 black 10
2 blue 19
3 ...
This works!
A solution is to convert the characters to symbols
func_4 <- function(.x) { .x = as.symbol(.x)
starwars %>% count({{.x}}) }
func_4("eye_color")
# A tibble: 15 x 2
eye_color n
<chr> <int>
1 black 10
2 blue 19
3 ...
This also works!
Bringing back map
Before I continue, I think nniloc's solution is better for your problem.
But you could use map as follows
starwars %>%
select_if(negate(is.list)) %>%
names() %>%
map(function(.x) {x = as.symbol(.x)
starwars %>% count( {{ x }} )
})
or
starwars %>%
select_if(negate(is.list)) %>%
names() %>%
map(as.symbol) %>%
map(function(.x) {
starwars %>% count( {{ .x }} )
})
When you use the ~
notation, .x
is now a "pronoun" that refers to the symbols directly, so we can use !!
to access the symbols directly. (I don't fully understand this).
starwars %>%
select_if(negate(is.list)) %>%
names() %>%
map(as.symbol) %>%
map(~ starwars %>% count( !! .x ))
Regarding imap()
, it looks like you want to code in python (or some other language with iteration). imap()
is short hand for map2(.x, names(.x), ...)
so is distinct from enumerate()
in python. There are R functions like seq_along
which give you position in an object, but I haven't used those with map.
Related Topics
Find Match of Two Data Frames and Rewrite The Answer as Data Frame
Find Second Highest Value on a Raster Stack in R
How to Set Contrasts for My Variable in Regression Analysis with R
Cannot Install R Tseries, Quadprog ,Xts Packages in Linux
Means from a List of Data Frames in R
How to Align or Center The Bars of a Histogram on The X Axis
Adding an Image to Shiny Action Button
Could Not Find Function Tagpos
Netlogo - Misalignment with Imported Gis Shapefiles
Loop with a Defined Ggplot Function Over Multiple Dataframes
Convert Unicode to Readable Characters in R
Grouped Bar Chart on R Using Ggplot2
Axis-Labeling in R Histogram and Density Plots; Multiple Overlays of Density Plots
Data.Table Objects Aren't Updated in Rstudio Environment Panel