Using tidy eval for multiple, arbitrary filter conditions
Create a list of calls and splice them in:
library(dplyr)
library(gapminder)
cols <- list("country", "year")
ops <- list("%in%", ">=")
vals <- list(c("Albania", "France"), 2007)
# Assumes LHS is the name of a variable and OP is
# the name of a function
op_call <- function(op, lhs, rhs) {
call(op, sym(lhs), rhs)
}
my_filter <- function(data, cols, ops, vals) {
exprs <- purrr::pmap(list(ops, cols, vals), op_call)
data %>% dplyr::filter(!!!exprs)
}
gapminder %>% my_filter(cols, ops, vals)
#> # A tibble: 2 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Albania Europe 2007 76.4 3600523 5937.
#> 2 France Europe 2007 80.7 61083916 30470.
Here we don't have to worry about scoping issues because (a) the column names are assumed to be defined in the data mask, (b) the values are passed by value and inlined in the created calls, and (c) the functions are assumed to be binary operators and these are rarely redefined.
To allow custom user functions, there are two ways we could go about it. First, we could take an environment and create quosures manually with new_quosure()
:
op_call <- function(op, lhs, rhs, env = caller_env()) {
new_quosure(call(op, sym(lhs), rhs), env)
}
my_filter <- function(data, cols, ops, vals, env = caller_env()) {
exprs <- purrr::pmap(list(ops, cols, vals), op_call, env)
data %>% dplyr::filter(!!!exprs)
}
gapminder %>% my_filter(cols, ops, vals)
local({
my_op <- `%in%`
gapminder %>% my_filter(cols, list("my_op", ">="), vals)
})
#> # A tibble: 2 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Albania Europe 2007 76.4 3600523 5937.
#> 2 France Europe 2007 80.7 61083916 30470.
The other way, perhaps simpler, is to allow the call to contain inlined functions. To that end, use rlang::call2()
instead of base::call()
:
op_call <- function(op, lhs, rhs) {
call2(op, sym(lhs), rhs)
}
my_filter <- function(data, cols, ops, vals) {
exprs <- purrr::pmap(list(ops, cols, vals), op_call)
data %>% dplyr::filter(!!!exprs)
}
local({
my_op <- `%in%`
gapminder %>% my_filter(cols, list(my_op, ">="), vals)
})
#> # A tibble: 2 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Albania Europe 2007 76.4 3600523 5937.
#> 2 France Europe 2007 80.7 61083916 30470.
The downside of inlining functions is that this will prevent optimisations and transportability to other dplyr backends.
Using tidy eval for multiple dplyr filter conditions
Using the tidyverse, you could re-write that function as
library(dplyr)
library(purrr) # for map2()
my_filter <- function(df, cols, conds){
fp <- map2(cols, conds, function(x, y) quo((!!(as.name(x))) %in% !!y))
filter(df, !!!fp)
}
my_filter(gapminder::gapminder, cols = list("continent", "country"),
conds = list("Europe", c("Albania", "France")))
This is calling the equivalent of
filter(gapminder, continent %in% "Europe", country %in% c("Albania", "France"))
The main reason this works is that you can pass multiple arguments to filter()
and they are implicitly combined with &
. And map2()
is just a tidyverse equivalent for mapply
with two objects to iterate.
What's the tidyeval approach of using dplyr::filter with a database when condition is specified as string?
Use rlang::parse_expr
.
library(dplyr)
iris_db %>% filter(!!rlang::parse_expr(filter_str)) %>% collect() %>% count()
# n
# <int>
#1 50
It returns the query:
iris_db %>% filter(!!rlang::parse_expr(filter_str)) %>% show_query()
#<SQL>
#SELECT *
#FROM `iris`
#WHERE (`Species` = 'setosa')
Using tidy evaluation to filter in my own function
You can use deparse
+ substitute
to change unquoted arguments to quoted ones.
library(dplyr)
test <- function(df, condition) {
val <- deparse(substitute(condition))
df %>% filter(supp %in% val)
}
test(ToothGrowth, VC)
Dplyr Multiple Lags Tidy Eval?
From this blog post: multiple lags with tidy evaluation by Romain François
library(rlang)
library(tidyverse)
a <- as_tibble(c(1:100))
n_lags <- 3
lags <- function(var, n = 3) {
var <- enquo(var)
indices <- seq_len(n)
# create a list of quosures by looping over `indices`
# then give them names for `mutate` to use later
map(indices, ~ quo(lag(!!var, !!.x))) %>%
set_names(sprintf("L_%02d.%s", indices, "y"))
}
# unquote the list of quosures so that they are evaluated by `mutate`
a %>%
mutate_at(vars(value), funs(!!!lags(value, n_lags)))
#> # A tibble: 100 x 4
#> value L_01.y L_02.y L_03.y
#> <int> <int> <int> <int>
#> 1 1 NA NA NA
#> 2 2 1 NA NA
#> 3 3 2 1 NA
#> 4 4 3 2 1
#> 5 5 4 3 2
#> 6 6 5 4 3
#> 7 7 6 5 4
#> 8 8 7 6 5
#> 9 9 8 7 6
#> 10 10 9 8 7
#> # ... with 90 more rows
Created on 2019-02-15 by the reprex package (v0.2.1.9000)
How to filter a data frame programmatically with dplyr and tidy evaluation?
Try
filter_starwars <- function(...) {
F <- quos(...)
filter(starwars, !!!F)
}
filter_starwars(species == 'Human', homeworld %in% c('Tatooine', 'Alderaan'), height > 175)
# # A tibble: 7 × 13
# name height mass hair_color skin_color eye_color birth_year
# <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
# 1 Darth Vader 202 136 none white yellow 41.9
# 2 Owen Lars 178 120 brown, grey light blue 52.0
# 3 Biggs Darklighter 183 84 black light brown 24.0
# 4 Anakin Skywalker 188 84 blond fair blue 41.9
# 5 Cliegg Lars 183 NA brown fair blue 82.0
# 6 Bail Prestor Organa 191 NA black tan brown 67.0
# 7 Raymus Antilles 188 79 brown light brown NA
# # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
# # films <list>, vehicles <list>, starships <list>
See https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html. Briefly, quos
captures ...
as a list, without evaluating the arguments. !!!
splices and unquotes the arguments for evaluation in filter()
.
Join and group_by tidy eval issue
The issue is using str
which gets the structure of an object. Assuming that colName
is passed as a string, we don't need any wrapping. Inside the function it is converted to symbol with ensym
. So, either get the input (assume it is a string) before converting to symbol as a different object or make use of as_string
from rlang
emp_turnover_fun <- function(data, colName, year = "2015") {
# Convert colName to symbol or check if symbol
colName <- ensym(colName)
colName_str <- rlang::as_string(colName) ## converted to string
# Terminations by year and variable in df
term_test <- data %>%
filter(year(DateofTermination) == year) %>%
count(!!(colName)) %>%
clean_names()
# Start employees by var and year
fun_year_job <- paste(year, "-01-01", sep = "")
start_test <- data %>%
select(DateofHire, DateofTermination, !!(colName)) %>%
filter(
DateofHire <= fun_year_job,
DateofTermination > fun_year_job | is.na(DateofTermination)
) %>%
count(!!(colName))
# End employees by year and var
year_pos <- year %>% as.character()
year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
end_test <- data %>%
select(DateofHire, DateofTermination, !!(colName)) %>%
filter(
DateofHire <= fun_year2_pos,
DateofTermination > fun_year2_pos | is.na(DateofTermination)
) %>%
count(!!(colName))
join_turnover_year <- full_join(start_test, end_test,
by = colName_str) %>% # use the string
full_join(y = term_test, by = colName_str) %>% # use the string
setNames(c(colName_str, "Start_Headcount", "End_Headcount",
"Terminations")) %>% # here as well
group_by({{colName}}) %>%
summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
return(join_turnover_year)
}
It is safer to do as_string
as opposed to taking the input directly as string i.e. ensym
can work with both unquoted or quoted values, thus if we are passing unquoted, then grabbing the input doesn't work i.e. it may need deparse(substitute(colName))
. Instead, first convert to symbol and then do the conversion back to string with as_string
How to pass an expression to the filter() verb the tidy way?
We can select
the columns of interest and extract the components with ..1
, ..2
(or .x
, .y
- if there are only 2 columns)
library(dplyr)
library(tibble)
library(purrr)
tibble(pick_these = get_these %>%
map(rlang::parse_expr)) %>%
mutate(wat = names(get_these),
goods = list(diamonds)) %>%
mutate(goods = pmap(select(., goods, pick_these), ~ {
..1 %>%
filter(rlang::eval_tidy( ..2))
})) %>%
dplyr::select(goods, wat)
# A tibble: 2 x 2
# goods wat
# <list> <chr>
#1 <tibble [3,903 × 10]> Ideal E
#2 <tibble [307 × 10]> Good J
dplyr filter keep the NAs AND OR conditions
Use |
instead of &
. With filter
, multiple expressions separated by ,
are taken as &
. It is not possible to have a value that is both NA
and not equal to 182
library(dplyr)
test_data_1 %>%
filter(is.na(Art) | Art != 182)
-output
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA
The second part of the question is with %in%
. We can use |
again
test_data_1 %>%
filter(Art %in% c(140,188) | is.na(Art))
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA
NOTE: By default, filter
removes the NA
elements. In addition, there is no na.rm
argument in filter
Related Topics
Placement of Error Bars in Barplot Using Ggplot2
How to Substitute Symbols in a Language Object
Map Array of Strings to an Array of Integers
Install R Packages in Azure Ml
Place 1 Heatmap on Another with Transparency in R
Data.Frames in R: Name Autocompletion
Change Date Print Format from Yyyy-Mm-Dd to Dd-Mm-Yyyy
Efficient Method to Filter and Add Based on Certain Conditions (3 Conditions in This Case)
Replacing for Loop with Foreach Loop
How to Highlight Area Between Two Lines? Ggplot
Shiny Leaflet Easyprint Plugin
Data.Table Joins - Select All Columns in the I Argument
How to Color Bar Plots When Using ..Prop.. in Ggplot
Align Points and Error Bars in Ggplot When Using 'Jitterdodge'
Complete Time Series by Group in R