Filter Data Frame by Character Column Name (In Dplyr)

Filter data frame by character column name (in dplyr)

Using rlang's injection paradigm

From the current dplyr documentation (emphasis by me):

dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.

So, essentially we need to perform two steps to be able to refer to the value "this" of the variable column inside dplyr::filter():

  1. We need to turn the variable column which is of type character into type symbol.

    Using base R this can be achieved by the function as.symbol()
    which is an alias for as.name(). The former is preferred by the
    tidyverse developers because it

    follows a more modern terminology (R types instead of S modes).

    Alternatively, the same can be achieved by rlang::sym() from the tidyverse.

  2. We need to inject the symbol from 1) into the dplyr::filter() expression.

    This is done by the so called injection operator !! which is basically syntactic
    sugar allowing to modify a piece of code before R evaluates it.

    (In earlier versions of dplyr (or the underlying rlang respectively) there used to be situations (incl. yours) where !! would collide with the single !, but this is not an issue anymore since !! gained the right operator precedence.)

Applied to your example:

library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"

df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1

Using alternative solutions

Other ways to refer to the value "this" of the variable column inside dplyr::filter() that don't rely on rlang's injection paradigm include:

  • Via the tidyselection paradigm, i.e. dplyr::if_any()/dplyr::if_all() with tidyselect::all_of()

    df %>% filter(if_any(.cols = all_of(column),
    .fns = ~ .x == 1))
  • Via rlang's .data pronoun and base R's [[:

    df %>% filter(.data[[column]] == 1)
  • Via magrittr's . argument placeholder and base R's [[:

    df %>% filter(.[[column]] == 1)

Filter column names dataframe by dplyr

Since OP is tagged with sql-like; something similar exists in the data.table package:

library(data.table)
df[!names(df) %like% 'Yesterday']

Dplyr filter using dynamic column name and dynamic value

Following the tidy evaluation syntax, use:

df %>% filter(!!sym(col) %in% !!vals)

sym() converts your string to a symbol, which dplyr knows to evaluate.

Also df %>% filter(!!as.name(col) %in% !!vals) works as @A.Suliman points out.

Filter a data frame by part of name in column

stringr can help solve this too if you want to stick in the tidyverse world.

library(tidyverse)
df %>%
filter(str_detect(Name,'LE'))

Name x y
1 LE-3-C 0.632601346894576 0.573187971758856
2 LE-4-A -0.818879986489542 0.284050547258268

Filter rows of a dataframe based on a column when both column name and values to filter are in variables

We can convert to symbol and evaluate (!!)

dplyr::filter(D, !! rlang::sym(col) %in% vals)
# X
#1 x1
#2 x2

Or another option is filter_at

D %>%
filter_at(vars(col), any_vars(. %in% vals))

Filtering tables with a character variable as a column name in dplyr R

This works for me:

library(dplyr)

var <- sym("cyl")

mtcars %>%
filter(!!var > 6)

Filter using a vector that has the column names

If I am not mistaken you want to drop a row if all values in the selected columns are "correct".

columns <- c("b","c")
df[apply(df[,..columns], 1, function(x) any( x=="Incorrect")) ]

a b c
1: 1 Incorrect Correct
2: 2 Correct Incorrect
3: 4 Incorrect Correct
4: 5 Incorrect Correct
5: 6 Correct Incorrect
6: 8 Incorrect Correct

If df is a data.frame:

df[apply(df[,columns], 1, function(x) any( x=="Incorrect")),]

Filter dataframe when column name-value pairs are stored in a list?

We could use across in filter looping over the names of 'l', created the logical expression by subsetting the 'l' using the key from column name (cur_column()) and negate (!). Note that cur_column() works currently only with across and not if_all/if_any (dplyr -1.0.6 on R 4.1.0)

library(dplyr)
df %>%
filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))

-output

# A tibble: 94 x 3
# State name value
# <chr> <chr> <dbl>
# 1 Alaska UrbanPop 48
# 2 Alaska Rape 44.5
# 3 Arizona UrbanPop 80
# 4 Arizona Rape 31
# 5 Arkansas UrbanPop 50
# 6 Arkansas Rape 19.5
# 7 California UrbanPop 91
# 8 California Rape 40.6
# 9 Colorado UrbanPop 78
#10 Colorado Rape 38.7
# … with 84 more rows

We could make use of if_all if we can set an attribute

library(magrittr)
df %>%
mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>%
filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))

Or with imap/reduce

library(purrr)
df %>%
filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
reduce(`&`))

Or another option is anti_join

for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))


Related Topics



Leave a reply



Submit