dplyr filter with condition on multiple columns
A possible dplyr
(0.5.0.9004 <= version < 1.0) solution is:
# > packageVersion('dplyr')
# [1] ‘0.5.0.9004’
dataset %>%
filter(!is.na(father), !is.na(mother)) %>%
filter_at(vars(-father, -mother), all_vars(is.na(.)))
Explanation:
vars(-father, -mother)
: select all columns exceptfather
andmother
.all_vars(is.na(.))
: keep rows whereis.na
isTRUE
for all the selected columns.
note: any_vars
should be used instead of all_vars
if rows where is.na
is TRUE
for any column are to be kept.
Update (2020-11-28)
As the _at
functions and vars
have been superseded by the use of across
since dplyr 1.0, the following way (or similar) is recommended now:
dataset %>%
filter(across(c(father, mother), ~ !is.na(.x))) %>%
filter(across(c(-father, -mother), is.na))
See more example of across
and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise")
in R after installing the latest version of dplyr
.
Filtering by multiple columns at once in `dplyr`
We could use if_all
or if_any
as Anil is pointing in his comments: For your code this would be:
https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/
if_any() and if_all()
"across() is very useful within summarise() and mutate(), but it’s hard to use it with filter() because it is not clear how the results would be combined into one logical vector. So to fill the gap, we’re introducing two new functions if_all() and if_any()."
if_all
data %>%
filter(if_all(starts_with("cp"), ~ . > 0.2))
mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.688 0.402 0.467 0.646
2 0.663 0.757 0.728 0.335
3 0.472 0.533 0.717 0.638
if_any:
data %>%
filter(if_any(starts_with("cp"), ~ . > 0.2))
mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.554 0.970 0.874 0.187
2 0.688 0.402 0.467 0.646
3 0.658 0.850 0.00813 0.542
4 0.663 0.757 0.728 0.335
5 0.472 0.533 0.717 0.638
Filter multiple columns based on same criteria in R
For your specific example of checking if at least one value in a row is at least 1, you could use rowSums
data[rowSums(data[,-1]) > 0, ]
# A B C D E
# 2 B 0 1 0 0
# 4 D 1 0 0 0
# 5 E 2 1 1 0
# 6 F 3 2 1 0
# 8 H 0 0 1 1
If you have other criteria in mind, you might as well consider using any
within apply
ind <- apply(data[,-1], 1, function(x) {any(x >= 1)})
data[ind,]
# A B C D E
# 2 B 0 1 0 0
# 4 D 1 0 0 0
# 5 E 2 1 1 0
# 6 F 3 2 1 0
# 8 H 0 0 1 1
dplyr filter multiple variables (columns) with multiple conditions
Another possible solution:
library(dplyr)
test %>%
filter(complete.cases(.) & if_all(everything(), ~ !(.x %in% 0:2)))
#> A B C
#> 1 6 5 6
#> 2 7 7 7
How to use dplyr across to filter NA in multiple columns
We can use across
to loop over the columns 'type', 'company' and return the rows that doesn't have any NA in the specified columns
library(dplyr)
df %>%
filter(across(c(type, company), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA
With filter
, there are two options that are similar to all_vars/any_vars
used with filter_at/filter_all
df %>%
filter(if_any(c(company, type), ~ !is.na(.)))
# id type company
#1 2 <NA> ADM
#2 3 North Alex
#3 4 South <NA>
#4 NA North BDA
#5 6 <NA> CA
Or using if_all
df %>%
filter(if_all(c(company, type), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA
data
df <- structure(list(id = c(1L, 2L, 3L, 4L, NA, 6L), type = c(NA, NA,
"North", "South", "North", NA), company = c(NA, "ADM", "Alex",
NA, "BDA", "CA")), class = "data.frame", row.names = c(NA, -6L
))
dplyr Filter Multiple numbers over multiple columns simultaneously
You can use the OR operator |
like this:
dataset %>%
filter(SurgicalProcedureNumber %in% NumbersIcareAbout |
OtherSurgicalProcedureNumber %in% NumbersIcareAbout |
YetAnotherSurgicalProcedureNumber %in% NumbersIcareAbout)
# PatientNum Age SurgicalProcedureNumber OtherSurgicalProcedureNumber BloodLoss YetAnotherSurgicalProcedureNumber
# 1 1 10 21356 54321 5 11111
# 2 2 21 21424 21356 4 22222
# 3 4 42 12345 12345 10 21356
# 4 5 35 54321 21424 5 555555
I'm sure there's a more elegant way, but this filters to those with matches in 2 or more columns (I think):
dataset %>%
filter((SurgicalProcedureNumber %in% NumbersIcareAbout +
OtherSurgicalProcedureNumber %in% NumbersIcareAbout +
YetAnotherSurgicalProcedureNumber %in% NumbersIcareAbout) >= 2)
# PatientNum Age SurgicalProcedureNumber OtherSurgicalProcedureNumber BloodLoss YetAnotherSurgicalProcedureNumber
# 1 2 21 21424 21356 4 22222
It works by summing together the logical vectors that result from your comparisons using %in%
. It relies on the fact that TRUE + TRUE
gives 2
. So, if a row has a value of 2 or greater then it has multiple matches.
Filtering multiple columns of data frame inside a loop in R
You don't need a loop you can use filter
with across
to apply a function for multiple columns
library(dplyr)
my_df %>% filter(across(all_of(col_names), ~!. %in% color_words))
# word1 word2 word3
#1 one apple red
If you have an older version of dplyr
, use filter_at
:
my_df %>% filter_at(col_names, all_vars(!. %in% color_words))
Logic for filtering dependent on two columns
A bit slower than the other answers here (more steps involved), but for me a bit clearer:
df4 %>%
filter(pop == 3) %>%
distinct(group) %>%
pull(group) -> groups
df4 %>%
filter(group %in% groups)
or if you want to combine the two steps:
df4 %>%
filter(group %in% df4 %>%
filter(pop == 3) %>%
distinct(group) %>%
pull(group))
Using dplyr to filter on multiple columns
Try negating the entire expression of what you don't want:
dplyr::filter(df, !(year == 2020 & quarter %in% 1:2))
measure year quarter
1 a 2017 1
2 a 2017 2
3 a 2017 3
4 a 2017 4
5 b 2018 1
6 b 2018 2
7 b 2018 3
8 b 2018 4
9 c 2019 1
10 c 2019 2
11 c 2019 3
12 c 2019 4
13 d 2020 3
14 d 2020 4
year == 2020 & quarter %in% 1:2
says to keep rows where the year
is 2020 AND quarter
is 1
or 2
. The !
negates the entire expression so you exclude those rows.
FYI, you can also use dplyr::between(quarter, 1, 2)
R dplyr filter string condition on multiple columns
You can use filter_at
with any_vars
to select rows that have at least one value of "X"
.
library(dplyr)
df %>% filter_at(vars(v2:v5), any_vars(. == 'X'))
# v1 v2 v3 v4 v5
#1 1 A B X C
#2 2 A B C X
However, filter_at
has been superseeded so to translate this into across
you can do :
df %>% filter(Reduce(`|`, across(v2:v5, ~. == 'X')))
It is also easier in base R :
df[rowSums(df[-1] == 'X') > 0, ]
Related Topics
R: Rjava Package Install Failing
Reorder Levels of a Factor Without Changing Order of Values
Warning Message: in '...': Invalid Factor Level, Na Generated
How Split Column of List-Values into Multiple Columns
Adding a New Column Based Upon Values in Another Column Using Dplyr
Divide All Columns by the Value from the 2Nd Column - Apply for All Rows
Calculate the Mean of Every 13 Rows in Data Frame
Add a Common Legend For Combined Ggplots
Force R Not to Use Exponential Notation (E.G. E+10)
How to Test When Condition Returns Numeric(0) in R
How to Add Row and Column to a Dataframe of Different Length
If Else Statements to Check If a String Contains a Substring in R
Case Statement Equivalent in R
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%
Select the Top N Values by Group
How to Save for Loop Results in Data Frame Using Cbind