Filter data frame by character column name (in dplyr)
Using rlang's injection paradigm
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this"
of the variable column
inside dplyr::filter()
:
We need to turn the variable
column
which is of type character into typesymbol
.Using base R this can be achieved by the function
as.symbol()
which is an alias foras.name()
. The former is preferred by the
tidyverse developers because itfollows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by
rlang::sym()
from the tidyverse.We need to inject the symbol from 1) into the
dplyr::filter()
expression.This is done by the so called injection operator
!!
which is basically syntactic
sugar allowing to modify a piece of code before R evaluates it.(In earlier versions of
dplyr
(or the underlyingrlang
respectively) there used to be situations (incl. yours) where!!
would collide with the single!
, but this is not an issue anymore since!!
gained the right operator precedence.)
Applied to your example:
library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1
Using alternative solutions
Other ways to refer to the value "this"
of the variable column
inside dplyr::filter()
that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e.
dplyr::if_any()
/dplyr::if_all()
withtidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column),
.fns = ~ .x == 1))Via rlang's
.data
pronoun and base R's[[
:df %>% filter(.data[[column]] == 1)
Via magrittr's
.
argument placeholder and base R's[[
:df %>% filter(.[[column]] == 1)
Filter column names dataframe by dplyr
Since OP is tagged with sql-like
; something similar exists in the data.table
package:
library(data.table)
df[!names(df) %like% 'Yesterday']
Dplyr filter using dynamic column name and dynamic value
Following the tidy evaluation syntax, use:
df %>% filter(!!sym(col) %in% !!vals)
sym()
converts your string to a symbol, which dplyr knows to evaluate.
Also df %>% filter(!!as.name(col) %in% !!vals)
works as @A.Suliman points out.
Filter a data frame by part of name in column
stringr can help solve this too if you want to stick in the tidyverse world.
library(tidyverse)
df %>%
filter(str_detect(Name,'LE'))
Name x y
1 LE-3-C 0.632601346894576 0.573187971758856
2 LE-4-A -0.818879986489542 0.284050547258268
Filter rows of a dataframe based on a column when both column name and values to filter are in variables
We can convert to sym
bol and evaluate (!!
)
dplyr::filter(D, !! rlang::sym(col) %in% vals)
# X
#1 x1
#2 x2
Or another option is filter_at
D %>%
filter_at(vars(col), any_vars(. %in% vals))
Filtering tables with a character variable as a column name in dplyr R
This works for me:
library(dplyr)
var <- sym("cyl")
mtcars %>%
filter(!!var > 6)
Filter using a vector that has the column names
If I am not mistaken you want to drop a row if all values in the selected columns are "correct".
columns <- c("b","c")
df[apply(df[,..columns], 1, function(x) any( x=="Incorrect")) ]
a b c
1: 1 Incorrect Correct
2: 2 Correct Incorrect
3: 4 Incorrect Correct
4: 5 Incorrect Correct
5: 6 Correct Incorrect
6: 8 Incorrect Correct
If df
is a data.frame
:
df[apply(df[,columns], 1, function(x) any( x=="Incorrect")),]
Filter dataframe when column name-value pairs are stored in a list?
We could use across
in filter
looping over the names
of 'l', created the logical expression by subsetting the 'l' using the key from column name (cur_column()
) and negate (!
). Note that cur_column()
works currently only with across
and not if_all/if_any
(dplyr
-1.0.6
on R 4.1.0
)
library(dplyr)
df %>%
filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))
-output
# A tibble: 94 x 3
# State name value
# <chr> <chr> <dbl>
# 1 Alaska UrbanPop 48
# 2 Alaska Rape 44.5
# 3 Arizona UrbanPop 80
# 4 Arizona Rape 31
# 5 Arkansas UrbanPop 50
# 6 Arkansas Rape 19.5
# 7 California UrbanPop 91
# 8 California Rape 40.6
# 9 Colorado UrbanPop 78
#10 Colorado Rape 38.7
# … with 84 more rows
We could make use of if_all
if we can set an attribute
library(magrittr)
df %>%
mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>%
filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))
Or with imap/reduce
library(purrr)
df %>%
filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
reduce(`&`))
Or another option is anti_join
for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))
Related Topics
Reasons For Using the Set.Seed Function
A Comprehensive Survey of the Types of Things in R; 'Mode' and 'Class' and 'Typeof' Are Insufficient
Error in ≪My Code≫: Target of Assignment Expands to Non-Language Object
How to Send an Email With Attachment from R in Windows
Using Stat_Function and Facet_Wrap Together in Ggplot2 in R
Removing Empty Rows of a Data File in R
Adding a New Column to Each Element in a List of Tables or Data Frames
Selecting Only Numeric Columns from a Data Frame
Returning Multiple Objects in an R Function
Why Does X[Y] Join of Data.Tables Not Allow a Full Outer Join, or a Left Join
Fitting Several Regression Models With Dplyr
Replace X-Axis With Own Values
Putting Mathematical Symbols and Subscripts Mixed With Regular Letters
Select Multiple Columns in Data.Table by Their Numeric Indices