Filter rows in dplyr chain if a set of rows doesn't contain a specific word
We create a grouping column based on the condition that every fourth row is a new block (gl
), then filter
out the groups where the first
element of 'name' is not a _number
or _slider
, then ungroup
and remove the temporary 'grp' column created
library(dplyr)
df %>%
group_by(grp = as.integer(gl(n(), 4, n()))) %>%
filter(!str_detect(first(name), "_(number|slider)")) %>%
ungroup %>%
select(-grp)
Update
Based on the comments from the OP i.e. blocks are determined by their common prefix, then extract the first word
, use that as grouping variable and do the filter
as before
library(stringr)
df %>%
group_by(grp = word(name, 1, sep="_")) %>%
filter(!str_detect(first(name), "_(number|slider)"))
and the ungroup
part remains the same as previous
If there are repeating prefixes i.e. non-adjacent prefixes and needs to be considered as separate blocks, then use rleid
from data.table
to create the grouping variable
df %>%
group_by(grp = rleid(word(name, 1, sep="_"))) %>%
filter(!str_detect(first(name), "_(number|slider)"))
group_by and keep all groups that does not not contain specific value and filter where there is value
If there could be multiple Yes
values:
df %>%
group_by(Code) %>%
slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes"))
Code Inst
<chr> <chr>
1 a Yes
2 b No
3 b No
4 b No
5 b No
6 b No
Considering the updated question:
df %>%
mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
group_by(Code) %>%
slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes")) %>%
filter(Date == min(Date))
Code Inst Date
<chr> <chr> <date>
1 a Yes 2021-01-01
2 b No 2021-01-06
3 b No 2021-01-06
4 b No 2021-01-06
filter() (dplyr) does not distinguish between character and number?
In R, the expression 20 == "20"
is valid, though some (from other programming languages) might consider that a little "sloppy". When that is evaluated, it up-classes the 20
to "20"
for the comparison. This silent casting can be good (useful and flexible), but it can also cause unintended, undesired, and/or surprising results. (The fact that it's silent is what I dislike about it, but convenience is convenience.)
If you want to be perfectly clear about your comparison, you can test for class as well. In your example, you show 20
which is numeric
and not technically integer
(which would be 20L
), but you can shape the precision of the conditional to your own tastes:
filter(data, is.numeric(depth_m) & depth_m == 20)
This will still up-class the 20
to "20"
, but because the first portion is.numeric(.)
fails, the combination of the two will fail as well. Realize that the specificity of that test is absolute: if the column is indeed character, then you will always get zero rows, which may not be what you want. If instead you want to remove non-20
rows only if they are 20 and numeric, then perhaps
filter(data, !is.numeric(depth_m) | depth_m == 20)
This goes down the dizzying logic of "if it is not numeric, then it obviously cannot truly be 20
, so keep it ... but if it is numeric, make sure it is definitely 20
". Of course, we run into the premise here that there is no way that one portion of the column can be numeric while another cannot, so ... perhaps that's over-indulging the specificity of filtering.
dplyr filter statement not in expression from a data.frame
Actually I got the answer. Add an unlist to the week_e data.frame then it is solved
df1 = data.frame(week=c(1,2,3,4,5,6),sales=c(10,24,23,54,65,45))
week_e=unlist(data.frame(week=c(2,5)))
df1 %>%
filter(!week %in% week_e)
week sales
1 10
3 23
4 54
6 45
dplyr filter keep the NAs AND OR conditions
Use |
instead of &
. With filter
, multiple expressions separated by ,
are taken as &
. It is not possible to have a value that is both NA
and not equal to 182
library(dplyr)
test_data_1 %>%
filter(is.na(Art) | Art != 182)
-output
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA
The second part of the question is with %in%
. We can use |
again
test_data_1 %>%
filter(Art %in% c(140,188) | is.na(Art))
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA
NOTE: By default, filter
removes the NA
elements. In addition, there is no na.rm
argument in filter
Why won't filter() work for filtering out multiple rows that don't equal certain characters?
The error here is just in your logic: by allowing everything that isn't A and everything that isn't B, you ensure that all items are kept. This is because anything that's A isn't B, so it's kept, and anything that's B isn't A, so it's kept too. A better way to handle this would be to create a vector, say, exclusions
, of all the values of nests_2020
(note that you switched from nests_2021
to nests_2020
in your question) you'd like to exclude, then use something like filter(szb4, !(nests_2020 %in% exclusions))
.
Filter in a dplyr group only when the condition is met else do not
Actually, I found the answer in another related question.
This uses a data.table
one liner which in my case was:
library(data.table)
test <- setDT(test)[, if(any(is.na(stamp_score))) .SD[is.na(stamp_score)] else .SD, .(hit, indx)]
Essentially, this code subsets the group only if there is a NA
in the "stamp_score" column else it does not.
Thanks to everyone who tried to help and also helped me improve my question over time.
Related Topics
Multiple Strings with Str_Detect R
R 3.4.1 "Single Candle" Personal Library Path Error: Unable to Create 'Na'
Ggplot2:Adding Two Errorbars to Each Point in Scatterplot
Reversed Order After Coord_Flip in R
Calling a Function from a Namespace
How to Refer to a Variable Name with Spaces
Is There a Technical Difference Between "=" and "<-"
Rank Variable by Group (Dplyr)
R Ggplot Barplot; Fill Based on Two Separate Variables
Install R Packages from Github Downloading Master.Zip
How to Set Seed for Random Simulations with Foreach and Domc Packages
Ggplot2: Line Connecting the Means of Grouped Data
R: Using Rgl to Generate 3D Rotatable Plots That Can Be Viewed in a Web Browser