Use Filter in Dplyr Conditional on an If Statement in R

Use filter in dplyr conditional on an if statement in R

You could do

library(dplyr)
y <- ""
data.frame(x = 1:5) %>%
{if (y=="") filter(., x>3) else filter(., x<3)} %>%
tail(1)

or

data.frame(x = 1:5) %>% 
filter(if (y=="") x>3 else x<3) %>%
tail(1)

or even store your pipe in the veins of

mypipe <- . %>% tail(1) %>% print
data.frame(x = 1:5) %>% mypipe

R: IF statement in dplyr::filter requires ELSE otherwise fails?

We can return TRUE in else condition which will select all the rows in case the condition is FALSE and is not dependent on the value in the column we are testing.

library(dplyr)
a <- NA
mtcars %>% filter(if(!is.na(a)) cyl == a else TRUE)

and to answer your question, yes if would require else part because without it, it would just return NULL which will fail in filter. See this example :

num <- 2
a <- if(num > 1) 'yes'
a
#[1] "yes"
a <- if(num > 3) 'yes'
a
#NULL

Hence when you use

a <- NA
mtcars %>% filter(if(!is.na(a)) cyl == a)

What actually happens is

mtcars %>% filter(NULL)

which returns the same error message.

How to filter a grouped dataframe with a conditional statement using dplyr?

To count the number of unique values we can use n_distinct and filter the rows based on that.

library(dplyr)

df %>%
group_by(country, year) %>%
filter(if(n_distinct(version) == 2) version == 'versionA' else TRUE)


# country year version
# <fct> <dbl> <fct>
#1 country1 2011 versionA
#2 country2 2011 versionA
#3 country3 2011 versionB

conditional filtering based on grouped data in R using dplyr

Here's another method that selects directly using math rather than %in%

data %>% filter(col * sign((group < 3) - 0.5) > 0)
#> # A tibble: 76 x 3
#> group year col
#> <int> <int> <dbl>
#> 1 2 1985 2.20
#> 2 3 1986 -0.205
#> 3 4 1991 -2.10
#> 4 3 1994 -0.113
#> 5 2 1997 1.90
#> 6 1 2000 1.37
#> 7 3 2002 -0.805
#> 8 4 2003 -0.535
#> 9 1 2004 0.792
#> 10 3 2006 -1.28
#> # ... with 66 more rows

R filter rows such that one column is conditional on two other columns


df %>%
group_by(id) %>%
filter(any(n1 == 1), any(n2 == 1))
# A tibble: 6 x 3
# Groups: id [3]
id n1 n2
<chr> <dbl> <dbl>
1 firm a 1 0
2 firm b 1 0
3 firm e 1 0
4 firm a 0 1
5 firm e 0 1
6 firm b 0 1

Conditional filtering using tidyverse

As @docendo-discimus pointed out in the comments, the following solutions work. I also added rlang::has_name instead of "a" %in% names(.).

This Q&A contains the original idea: Conditionally apply pipeline step depending on external value.

df1 %>% 
filter(if(has_name("a")) a == 1 else TRUE)
# A tibble: 2 x 2
a b
<int> <chr>
1 1 a
2 1 b

df2 %>%
filter(if(has_name("a")) a == 1 else TRUE)
# A tibble: 4 x 1
b
<chr>
1 a
2 a
3 b
4 b

Or alternatively, by using {}:

df1 %>%
{if(has_name("a")) filter(., a == 1L) else .}
# A tibble: 2 x 2
a b
<int> <chr>
1 1 a
2 1 b

> df2 %>%
+ {if(has_name("a")) filter(., a == 1L) else .}
# A tibble: 4 x 1
b
<chr>
1 a
2 a
3 b
4 b

if else with filter R

Your attempt was very close but there appears to be some syntax issues; this should solve your problem:

library(tidyverse)

df1 <- data.frame(
sample_id = c('SB024', '3666-01', '3666-01', '3666-02'),
FAO = c(100,50,3,5)
)

df1 %>%
filter(ifelse(str_detect(sample_id, "3666"), FAO >=4, FAO >20))
#> sample_id FAO
#> 1 SB024 100
#> 2 3666-01 50
#> 3 3666-02 5

df1 %>%
filter(ifelse(str_detect(sample_id, "XXXX"), FAO >=4, FAO >20))
#> sample_id FAO
#> 1 SB024 100
#> 2 3666-01 50

Created on 2021-11-05 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit