Filter Groups in Dplyr That Exclusively Contain Specific Combinations of Values

R dplyr: filtering dataframe by combination of values

I think this is what you want - we group by (look at unique combinations of) Site, Spot, Transect, and Date, and then keep the whole group if ranks 1:3 are all present, and otherwise discard the whole group.

df %>%
  group_by(Site_ID, Spot_Nr, Transkt_Nr, Date) %>%
  filter(all(1:3 %in% rank))
# # A tibble: 132 × 13
# # Groups:   Site_ID, Spot_Nr, Transkt_Nr, Date [44]
# Site_ID Spot_Nr Transkt_Nr Point_Nr  nobs  rank Tile      Date id    Point_ID   RED SWIR1 PdKeyT
# <chr>   <chr>   <chr>      <chr>    <int> <int> <chr>    <int> <chr> <chr>    <dbl> <dbl>  <int>
# 1 A       1       1          14          24     1 1008  20190531 14    1014      2221   730     60
# 2 A       1       1          15          23     2 1008  20190531 15    1015      2252   671     60
# 3 A       1       1          13           4     3 1008  20190531 13    1013      2212   970     60
# 4 A       1       1          14          24     1 1008  20191008 14    1014       864   978      2
# 5 A       1       1          15          23     2 1008  20191008 15    1015      1421  1378      2
# 6 A       1       1          13           4     3 1008  20191008 13    1013      1097  1132      2
# 7 A       1       1          14          24     1 1008  20191026 14    1014       799  1044      2
# 8 A       1       1          15          23     2 1008  20191026 15    1015      1252  1127      2
# 9 A       1       1          13           4     3 1008  20191026 13    1013       978   904      2
#10 A       1       2          15          26     1 1008  20191008 33    1033      1174  1243      2
# # … with 122 more rows

It's hard to know how you might want to generalize this. What I've show is good to check that all of a particular set of rank values are present. You could alternately do a test like n_distinct(rank) >= 3) if you wanted to keep a group if it had at least 3 distinct ranks.

Filtering within dplyr group_by so that combination of rows matching certain conditions remain

Use any :

library(dplyr)
df %>% group_by(address, zip_code) %>% filter(any(mailout) && any(!mailout))

#  address        zip_code date     mailout
#  <chr>             <int> <chr>    <lgl>  
#1 Higgens Square    62561 02/12/10 FALSE  
#2 Higgens Square    62561 28/03/13 TRUE   
#3 55 The Wren       91234 23/08/18 TRUE   
#4 55 The Wren       91234 19/09/13 FALSE  
#5 9A Sylvan Road    54332 16/11/10 TRUE   
#6 9A Sylvan Road    54332 31/01/17 FALSE

Or all so that each group gets only one value using which you can decide whether to keep the group or not.

df %>% group_by(address, zip_code) %>% filter(all(c(TRUE, FALSE) %in%  mailout))

Filtering according to combination of matching data across variables in R

You can filter by the Group column after this,

df <-as.data.frame(df)
  
df$v <- sapply(seq(df[,1]),function(x)
         paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
            Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
    
df

  Word1 Word2 distance speaker session  Group
1 WordA WordX     1.40      JB       1 Group1
2 WordX WordA     0.23      JB       1 Group1
3 WordB WordY     2.10      JB       1 Group2
4 WordY WordB     2.30      JB       1 Group2
5 WordC WordZ     4.70      JB       1 Group3
6 WordZ WordC     0.51      JB       1 Group3

dplyr filter columns with value 0 for all rows with unique combinations of other columns

An easy solution can be achieved by creating another column that contains the frequency of each species grouped by date, site and species (ignoring treatment). Then you can easily filter using this new column and afterwards eliminate it.

library(tidyverse)
df %>%
    # Group by date site and species
    group_by(date, site, species) %>%
    # Create new column that sums frequency values by grouping variables
    mutate(appears = sum(frequency)) %>%
    # ignore rows where appears = 0
    filter(appears != 0) %>%
    # Eliminate appears column
    select(-appears)

group by and filter data management using dplyr

Try

d %>% 
  group_by(c) %>% 
  filter(any(b == 1))

Which gives:

#Source: local data frame [6 x 3]
#Groups: c
#
#  a b c
#1 1 1 1
#2 2 2 1
#3 3 2 1
#4 4 1 2
#5 5 2 2
#6 6 2 2

Filter by combination of (row) pairs

A dplyr solution would be:

library(dplyr)
df <- data_frame(
  id = rep(1:4, each = 2),
  type = c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
)

types <- c("red", "yellow")

df %>% 
  group_by(id) %>% 
  filter(all(types %in% type))
#> # A tibble: 4 x 2
#> # Groups:   id [2]
#>      id   type
#>   <int>  <chr>
#> 1     2    red
#> 2     2 yellow
#> 3     4    red
#> 4     4 yellow

Update

Allowing for the equal combinations, e.g. blue, blue, we have to change the filter-call to the following:

types2 <- c("blue", "blue")

df %>% 
  group_by(id) %>% 
  filter(sum(types2 == type) == length(types2))
#> # A tibble: 2 x 2
#> # Groups:   id [1]
#>      id  type
#>   <int> <chr>
#> 1     1  blue
#> 2     1  blue

This solution also allows different types

df %>% 
  group_by(id) %>% 
  filter(sum(types == type) == length(types))
#> # A tibble: 4 x 2
#> # Groups:   id [2]
#>      id   type
#>   <int>  <chr>
#> 1     2    red
#> 2     2 yellow
#> 3     4    red
#> 4     4 yellow

R filtering multiple Combinations

It appears you want to select a range of dates. If you combine year and quarter into a single value, you can easily filter on those.

Example dataset

df = data.frame(year = c("2000","2000","2000","2000","2001","2001","2001","2001"),
                quar = c("1","2","3","4","1","2","3","4")
                )

Create combined field

df <- df %>% mutate(period = paste(df$year, df$quar))

  year quar period
1 2000    1 2000 1
2 2000    2 2000 2
3 2000    3 2000 3
4 2000    4 2000 4
5 2001    1 2001 1
6 2001    2 2001 2
7 2001    3 2001 3
8 2001    4 2001 4

Make sure not to separate the values with a . or -, as R will process them as a decimal point or a minus. You could also just combine them into a single digit (20001, 20002, etc), but I think that would be less legible.

Easy filtering

For a range:

df %>% filter(period >= "2000 3" & period <= "2001 2")

  year quar period
1 2000    3 2000 3
2 2000    4 2000 4
3 2001    1 2001 1
4 2001    2 2001 2

For multiple individual values, use the OR operator |:

df %>% filter(period == "2000 1" | period == "2000 3")

  year quar period
1 2000    1 2000 1
2 2000    3 2000 3

And of course, the ORs | and ANDs & can all be combined. Keep in mind to use parentheses to define the order of operations:

df %>% filter(period == "2000 1" | (period >= "2000 3" & period <= "2001 2"))

  year quar period
1 2000    1 2000 1
2 2000    3 2000 3
3 2000    4 2000 4
4 2001    1 2001 1
5 2001    2 2001 2