Filter Groups in Dplyr That Exclusively Contain Specific Combinations of Values

R dplyr: filtering dataframe by combination of values

I think this is what you want - we group by (look at unique combinations of) Site, Spot, Transect, and Date, and then keep the whole group if ranks 1:3 are all present, and otherwise discard the whole group.

df %>%
group_by(Site_ID, Spot_Nr, Transkt_Nr, Date) %>%
filter(all(1:3 %in% rank))
# # A tibble: 132 × 13
# # Groups: Site_ID, Spot_Nr, Transkt_Nr, Date [44]
# Site_ID Spot_Nr Transkt_Nr Point_Nr nobs rank Tile Date id Point_ID RED SWIR1 PdKeyT
# <chr> <chr> <chr> <chr> <int> <int> <chr> <int> <chr> <chr> <dbl> <dbl> <int>
# 1 A 1 1 14 24 1 1008 20190531 14 1014 2221 730 60
# 2 A 1 1 15 23 2 1008 20190531 15 1015 2252 671 60
# 3 A 1 1 13 4 3 1008 20190531 13 1013 2212 970 60
# 4 A 1 1 14 24 1 1008 20191008 14 1014 864 978 2
# 5 A 1 1 15 23 2 1008 20191008 15 1015 1421 1378 2
# 6 A 1 1 13 4 3 1008 20191008 13 1013 1097 1132 2
# 7 A 1 1 14 24 1 1008 20191026 14 1014 799 1044 2
# 8 A 1 1 15 23 2 1008 20191026 15 1015 1252 1127 2
# 9 A 1 1 13 4 3 1008 20191026 13 1013 978 904 2
#10 A 1 2 15 26 1 1008 20191008 33 1033 1174 1243 2
# # … with 122 more rows

It's hard to know how you might want to generalize this. What I've show is good to check that all of a particular set of rank values are present. You could alternately do a test like n_distinct(rank) >= 3) if you wanted to keep a group if it had at least 3 distinct ranks.

Filtering within dplyr group_by so that combination of rows matching certain conditions remain

Use any :

library(dplyr)
df %>% group_by(address, zip_code) %>% filter(any(mailout) && any(!mailout))

# address zip_code date mailout
# <chr> <int> <chr> <lgl>
#1 Higgens Square 62561 02/12/10 FALSE
#2 Higgens Square 62561 28/03/13 TRUE
#3 55 The Wren 91234 23/08/18 TRUE
#4 55 The Wren 91234 19/09/13 FALSE
#5 9A Sylvan Road 54332 16/11/10 TRUE
#6 9A Sylvan Road 54332 31/01/17 FALSE

Or all so that each group gets only one value using which you can decide whether to keep the group or not.

df %>% group_by(address, zip_code) %>% filter(all(c(TRUE, FALSE) %in%  mailout))

Filtering according to combination of matching data across variables in R

You can filter by the Group column after this,

df <-as.data.frame(df)

df$v <- sapply(seq(df[,1]),function(x)
paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]

df

Word1 Word2 distance speaker session Group
1 WordA WordX 1.40 JB 1 Group1
2 WordX WordA 0.23 JB 1 Group1
3 WordB WordY 2.10 JB 1 Group2
4 WordY WordB 2.30 JB 1 Group2
5 WordC WordZ 4.70 JB 1 Group3
6 WordZ WordC 0.51 JB 1 Group3

dplyr filter columns with value 0 for all rows with unique combinations of other columns

An easy solution can be achieved by creating another column that contains the frequency of each species grouped by date, site and species (ignoring treatment). Then you can easily filter using this new column and afterwards eliminate it.

library(tidyverse)
df %>%
# Group by date site and species
group_by(date, site, species) %>%
# Create new column that sums frequency values by grouping variables
mutate(appears = sum(frequency)) %>%
# ignore rows where appears = 0
filter(appears != 0) %>%
# Eliminate appears column
select(-appears)

group by and filter data management using dplyr

Try

d %>% 
group_by(c) %>%
filter(any(b == 1))

Which gives:

#Source: local data frame [6 x 3]
#Groups: c
#
# a b c
#1 1 1 1
#2 2 2 1
#3 3 2 1
#4 4 1 2
#5 5 2 2
#6 6 2 2

Filter by combination of (row) pairs

A dplyr solution would be:


library(dplyr)
df <- data_frame(
id = rep(1:4, each = 2),
type = c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
)

types <- c("red", "yellow")

df %>%
group_by(id) %>%
filter(all(types %in% type))
#> # A tibble: 4 x 2
#> # Groups: id [2]
#> id type
#> <int> <chr>
#> 1 2 red
#> 2 2 yellow
#> 3 4 red
#> 4 4 yellow

Update

Allowing for the equal combinations, e.g. blue, blue, we have to change the filter-call to the following:

types2 <- c("blue", "blue")

df %>%
group_by(id) %>%
filter(sum(types2 == type) == length(types2))
#> # A tibble: 2 x 2
#> # Groups: id [1]
#> id type
#> <int> <chr>
#> 1 1 blue
#> 2 1 blue

This solution also allows different types

df %>% 
group_by(id) %>%
filter(sum(types == type) == length(types))
#> # A tibble: 4 x 2
#> # Groups: id [2]
#> id type
#> <int> <chr>
#> 1 2 red
#> 2 2 yellow
#> 3 4 red
#> 4 4 yellow

R filtering multiple Combinations

It appears you want to select a range of dates. If you combine year and quarter into a single value, you can easily filter on those.

Example dataset

df = data.frame(year = c("2000","2000","2000","2000","2001","2001","2001","2001"),
quar = c("1","2","3","4","1","2","3","4")
)

Create combined field

df <- df %>% mutate(period = paste(df$year, df$quar))

year quar period
1 2000 1 2000 1
2 2000 2 2000 2
3 2000 3 2000 3
4 2000 4 2000 4
5 2001 1 2001 1
6 2001 2 2001 2
7 2001 3 2001 3
8 2001 4 2001 4

Make sure not to separate the values with a . or -, as R will process them as a decimal point or a minus. You could also just combine them into a single digit (20001, 20002, etc), but I think that would be less legible.

Easy filtering

For a range:

df %>% filter(period >= "2000 3" & period <= "2001 2")

year quar period
1 2000 3 2000 3
2 2000 4 2000 4
3 2001 1 2001 1
4 2001 2 2001 2

For multiple individual values, use the OR operator |:

df %>% filter(period == "2000 1" | period == "2000 3")

year quar period
1 2000 1 2000 1
2 2000 3 2000 3

And of course, the ORs | and ANDs & can all be combined. Keep in mind to use parentheses to define the order of operations:

df %>% filter(period == "2000 1" | (period >= "2000 3" & period <= "2001 2"))

year quar period
1 2000 1 2000 1
2 2000 3 2000 3
3 2000 4 2000 4
4 2001 1 2001 1
5 2001 2 2001 2


Related Topics



Leave a reply



Submit