How to Filter Multiple Columns With Same Condition in R

dplyr filter with condition on multiple columns

A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

# > packageVersion('dplyr')
# [1] ‘0.5.0.9004’

dataset %>%
filter(!is.na(father), !is.na(mother)) %>%
filter_at(vars(-father, -mother), all_vars(is.na(.)))

Explanation:

  • vars(-father, -mother): select all columns except father and mother.
  • all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.


Update (2020-11-28)

As the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

dataset %>%
filter(across(c(father, mother), ~ !is.na(.x))) %>%
filter(across(c(-father, -mother), is.na))

See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.

Filtering by multiple columns at once in `dplyr`

We could use if_all or if_any as Anil is pointing in his comments: For your code this would be:

https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

if_any() and if_all()

"across() is very useful within summarise() and mutate(), but it’s hard to use it with filter() because it is not clear how the results would be combined into one logical vector. So to fill the gap, we’re introducing two new functions if_all() and if_any()."

if_all

data %>% 
filter(if_all(starts_with("cp"), ~ . > 0.2))
  mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.688 0.402 0.467 0.646
2 0.663 0.757 0.728 0.335
3 0.472 0.533 0.717 0.638

if_any:

data %>% 
filter(if_any(starts_with("cp"), ~ . > 0.2))
  mt100 cp001   cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.554 0.970 0.874 0.187
2 0.688 0.402 0.467 0.646
3 0.658 0.850 0.00813 0.542
4 0.663 0.757 0.728 0.335
5 0.472 0.533 0.717 0.638

Filter multiple columns based on same criteria in R

For your specific example of checking if at least one value in a row is at least 1, you could use rowSums

data[rowSums(data[,-1]) > 0, ]
# A B C D E
# 2 B 0 1 0 0
# 4 D 1 0 0 0
# 5 E 2 1 1 0
# 6 F 3 2 1 0
# 8 H 0 0 1 1

If you have other criteria in mind, you might as well consider using any within apply

ind <- apply(data[,-1], 1, function(x) {any(x >= 1)})
data[ind,]
# A B C D E
# 2 B 0 1 0 0
# 4 D 1 0 0 0
# 5 E 2 1 1 0
# 6 F 3 2 1 0
# 8 H 0 0 1 1

dplyr filter multiple variables (columns) with multiple conditions

Another possible solution:

library(dplyr)

test %>%
filter(complete.cases(.) & if_all(everything(), ~ !(.x %in% 0:2)))

#> A B C
#> 1 6 5 6
#> 2 7 7 7

How to use dplyr across to filter NA in multiple columns

We can use across to loop over the columns 'type', 'company' and return the rows that doesn't have any NA in the specified columns

library(dplyr)
df %>%
filter(across(c(type, company), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA

With filter, there are two options that are similar to all_vars/any_vars used with filter_at/filter_all

df %>%
filter(if_any(c(company, type), ~ !is.na(.)))
# id type company
#1 2 <NA> ADM
#2 3 North Alex
#3 4 South <NA>
#4 NA North BDA
#5 6 <NA> CA

Or using if_all

df %>%
filter(if_all(c(company, type), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA

data

df <- structure(list(id = c(1L, 2L, 3L, 4L, NA, 6L), type = c(NA, NA, 
"North", "South", "North", NA), company = c(NA, "ADM", "Alex",
NA, "BDA", "CA")), class = "data.frame", row.names = c(NA, -6L
))

dplyr Filter Multiple numbers over multiple columns simultaneously

You can use the OR operator | like this:

dataset %>%
filter(SurgicalProcedureNumber %in% NumbersIcareAbout |
OtherSurgicalProcedureNumber %in% NumbersIcareAbout |
YetAnotherSurgicalProcedureNumber %in% NumbersIcareAbout)

# PatientNum Age SurgicalProcedureNumber OtherSurgicalProcedureNumber BloodLoss YetAnotherSurgicalProcedureNumber
# 1 1 10 21356 54321 5 11111
# 2 2 21 21424 21356 4 22222
# 3 4 42 12345 12345 10 21356
# 4 5 35 54321 21424 5 555555

I'm sure there's a more elegant way, but this filters to those with matches in 2 or more columns (I think):

dataset %>%
filter((SurgicalProcedureNumber %in% NumbersIcareAbout +
OtherSurgicalProcedureNumber %in% NumbersIcareAbout +
YetAnotherSurgicalProcedureNumber %in% NumbersIcareAbout) >= 2)

# PatientNum Age SurgicalProcedureNumber OtherSurgicalProcedureNumber BloodLoss YetAnotherSurgicalProcedureNumber
# 1 2 21 21424 21356 4 22222

It works by summing together the logical vectors that result from your comparisons using %in%. It relies on the fact that TRUE + TRUE gives 2. So, if a row has a value of 2 or greater then it has multiple matches.

Filtering multiple columns of data frame inside a loop in R

You don't need a loop you can use filter with across to apply a function for multiple columns

library(dplyr)
my_df %>% filter(across(all_of(col_names), ~!. %in% color_words))

# word1 word2 word3
#1 one apple red

If you have an older version of dplyr, use filter_at :

my_df %>% filter_at(col_names, all_vars(!. %in% color_words))

Logic for filtering dependent on two columns

A bit slower than the other answers here (more steps involved), but for me a bit clearer:

df4 %>% 
filter(pop == 3) %>%
distinct(group) %>%
pull(group) -> groups

df4 %>%
filter(group %in% groups)

or if you want to combine the two steps:

df4 %>% 
filter(group %in% df4 %>%
filter(pop == 3) %>%
distinct(group) %>%
pull(group))

Using dplyr to filter on multiple columns

Try negating the entire expression of what you don't want:

dplyr::filter(df, !(year == 2020 & quarter %in% 1:2))

measure year quarter
1 a 2017 1
2 a 2017 2
3 a 2017 3
4 a 2017 4
5 b 2018 1
6 b 2018 2
7 b 2018 3
8 b 2018 4
9 c 2019 1
10 c 2019 2
11 c 2019 3
12 c 2019 4
13 d 2020 3
14 d 2020 4

year == 2020 & quarter %in% 1:2 says to keep rows where the year is 2020 AND quarter is 1 or 2. The ! negates the entire expression so you exclude those rows.

FYI, you can also use dplyr::between(quarter, 1, 2)

R dplyr filter string condition on multiple columns

You can use filter_at with any_vars to select rows that have at least one value of "X".

library(dplyr)
df %>% filter_at(vars(v2:v5), any_vars(. == 'X'))

# v1 v2 v3 v4 v5
#1 1 A B X C
#2 2 A B C X

However, filter_at has been superseeded so to translate this into across you can do :

df %>% filter(Reduce(`|`, across(v2:v5, ~. == 'X')))

It is also easier in base R :

df[rowSums(df[-1] == 'X') > 0, ]


Related Topics



Leave a reply



Submit