R dplyr: filtering dataframe by combination of values
I think this is what you want - we group by (look at unique combinations of) Site, Spot, Transect, and Date, and then keep the whole group if ranks 1:3 are all present, and otherwise discard the whole group.
df %>%
group_by(Site_ID, Spot_Nr, Transkt_Nr, Date) %>%
filter(all(1:3 %in% rank))
# # A tibble: 132 × 13
# # Groups: Site_ID, Spot_Nr, Transkt_Nr, Date [44]
# Site_ID Spot_Nr Transkt_Nr Point_Nr nobs rank Tile Date id Point_ID RED SWIR1 PdKeyT
# <chr> <chr> <chr> <chr> <int> <int> <chr> <int> <chr> <chr> <dbl> <dbl> <int>
# 1 A 1 1 14 24 1 1008 20190531 14 1014 2221 730 60
# 2 A 1 1 15 23 2 1008 20190531 15 1015 2252 671 60
# 3 A 1 1 13 4 3 1008 20190531 13 1013 2212 970 60
# 4 A 1 1 14 24 1 1008 20191008 14 1014 864 978 2
# 5 A 1 1 15 23 2 1008 20191008 15 1015 1421 1378 2
# 6 A 1 1 13 4 3 1008 20191008 13 1013 1097 1132 2
# 7 A 1 1 14 24 1 1008 20191026 14 1014 799 1044 2
# 8 A 1 1 15 23 2 1008 20191026 15 1015 1252 1127 2
# 9 A 1 1 13 4 3 1008 20191026 13 1013 978 904 2
#10 A 1 2 15 26 1 1008 20191008 33 1033 1174 1243 2
# # … with 122 more rows
It's hard to know how you might want to generalize this. What I've show is good to check that all of a particular set of rank values are present. You could alternately do a test like n_distinct(rank) >= 3)
if you wanted to keep a group if it had at least 3 distinct ranks.
Filtering within dplyr group_by so that combination of rows matching certain conditions remain
Use any
:
library(dplyr)
df %>% group_by(address, zip_code) %>% filter(any(mailout) && any(!mailout))
# address zip_code date mailout
# <chr> <int> <chr> <lgl>
#1 Higgens Square 62561 02/12/10 FALSE
#2 Higgens Square 62561 28/03/13 TRUE
#3 55 The Wren 91234 23/08/18 TRUE
#4 55 The Wren 91234 19/09/13 FALSE
#5 9A Sylvan Road 54332 16/11/10 TRUE
#6 9A Sylvan Road 54332 31/01/17 FALSE
Or all
so that each group gets only one value using which you can decide whether to keep the group or not.
df %>% group_by(address, zip_code) %>% filter(all(c(TRUE, FALSE) %in% mailout))
Filtering according to combination of matching data across variables in R
You can filter
by the Group column after this,
df <-as.data.frame(df)
df$v <- sapply(seq(df[,1]),function(x)
paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
df
Word1 Word2 distance speaker session Group
1 WordA WordX 1.40 JB 1 Group1
2 WordX WordA 0.23 JB 1 Group1
3 WordB WordY 2.10 JB 1 Group2
4 WordY WordB 2.30 JB 1 Group2
5 WordC WordZ 4.70 JB 1 Group3
6 WordZ WordC 0.51 JB 1 Group3
dplyr filter columns with value 0 for all rows with unique combinations of other columns
An easy solution can be achieved by creating another column that contains the frequency of each species grouped by date, site and species (ignoring treatment). Then you can easily filter using this new column and afterwards eliminate it.
library(tidyverse)
df %>%
# Group by date site and species
group_by(date, site, species) %>%
# Create new column that sums frequency values by grouping variables
mutate(appears = sum(frequency)) %>%
# ignore rows where appears = 0
filter(appears != 0) %>%
# Eliminate appears column
select(-appears)
group by and filter data management using dplyr
Try
d %>%
group_by(c) %>%
filter(any(b == 1))
Which gives:
#Source: local data frame [6 x 3]
#Groups: c
#
# a b c
#1 1 1 1
#2 2 2 1
#3 3 2 1
#4 4 1 2
#5 5 2 2
#6 6 2 2
Filter by combination of (row) pairs
A dplyr
solution would be:
library(dplyr)
df <- data_frame(
id = rep(1:4, each = 2),
type = c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
)
types <- c("red", "yellow")
df %>%
group_by(id) %>%
filter(all(types %in% type))
#> # A tibble: 4 x 2
#> # Groups: id [2]
#> id type
#> <int> <chr>
#> 1 2 red
#> 2 2 yellow
#> 3 4 red
#> 4 4 yellow
Update
Allowing for the equal combinations, e.g. blue
, blue
, we have to change the filter-call to the following:
types2 <- c("blue", "blue")
df %>%
group_by(id) %>%
filter(sum(types2 == type) == length(types2))
#> # A tibble: 2 x 2
#> # Groups: id [1]
#> id type
#> <int> <chr>
#> 1 1 blue
#> 2 1 blue
This solution also allows different types
df %>%
group_by(id) %>%
filter(sum(types == type) == length(types))
#> # A tibble: 4 x 2
#> # Groups: id [2]
#> id type
#> <int> <chr>
#> 1 2 red
#> 2 2 yellow
#> 3 4 red
#> 4 4 yellow
R filtering multiple Combinations
It appears you want to select a range of dates. If you combine year and quarter into a single value, you can easily filter on those.
Example dataset
df = data.frame(year = c("2000","2000","2000","2000","2001","2001","2001","2001"),
quar = c("1","2","3","4","1","2","3","4")
)
Create combined field
df <- df %>% mutate(period = paste(df$year, df$quar))
year quar period
1 2000 1 2000 1
2 2000 2 2000 2
3 2000 3 2000 3
4 2000 4 2000 4
5 2001 1 2001 1
6 2001 2 2001 2
7 2001 3 2001 3
8 2001 4 2001 4
Make sure not to separate the values with a .
or -
, as R will process them as a decimal point or a minus. You could also just combine them into a single digit (20001, 20002, etc), but I think that would be less legible.
Easy filtering
For a range:
df %>% filter(period >= "2000 3" & period <= "2001 2")
year quar period
1 2000 3 2000 3
2 2000 4 2000 4
3 2001 1 2001 1
4 2001 2 2001 2
For multiple individual values, use the OR operator |
:
df %>% filter(period == "2000 1" | period == "2000 3")
year quar period
1 2000 1 2000 1
2 2000 3 2000 3
And of course, the ORs |
and ANDs &
can all be combined. Keep in mind to use parentheses to define the order of operations:
df %>% filter(period == "2000 1" | (period >= "2000 3" & period <= "2001 2"))
year quar period
1 2000 1 2000 1
2 2000 3 2000 3
3 2000 4 2000 4
4 2001 1 2001 1
5 2001 2 2001 2
Related Topics
Order Categorical Data in a Stacked Bar Plot with Ggplot2
Simple R 3D Interpolation/Surface Plot
How to Adjust the Font Size of Tablegrob
Changing Multiple Column Values Given a Condition in Dplyr
Installing R on Osx Big Sur (Edit: and Apple M1) for Use with Rcpp and Openmp
Adjusting the Width of Legend for Continuous Variable
R Dataframe: Aggregating Strings Within Column, Across Rows, by Group
How to Use More Than 2 Colors in the Color_Tile Function
How to Create an Infix %Between% Operator
Ggplot2 Add a Legend for Several Stat_Functions
How to Access the Name of the Variable Assigned to the Result of a Function Within the Function
How to Always Display 3 Decimal Places in Datatables in R Shiny
R Cannot Allocate Memory Though Memory Seems to Be Available
Installing Rcppeigen on Amazon Ec2