How to Specify "Does Not Contain" in Dplyr Filter

Filter rows in dplyr chain if a set of rows doesn't contain a specific word

We create a grouping column based on the condition that every fourth row is a new block (gl), then filter out the groups where the first element of 'name' is not a _number or _slider, then ungroup and remove the temporary 'grp' column created

library(dplyr)
df %>%
group_by(grp = as.integer(gl(n(), 4, n()))) %>%
filter(!str_detect(first(name), "_(number|slider)")) %>%
ungroup %>%
select(-grp)

Update

Based on the comments from the OP i.e. blocks are determined by their common prefix, then extract the first word, use that as grouping variable and do the filter as before

library(stringr)
df %>%
group_by(grp = word(name, 1, sep="_")) %>%
filter(!str_detect(first(name), "_(number|slider)"))

and the ungroup part remains the same as previous

If there are repeating prefixes i.e. non-adjacent prefixes and needs to be considered as separate blocks, then use rleid from data.table to create the grouping variable

df %>%
group_by(grp = rleid(word(name, 1, sep="_"))) %>%
filter(!str_detect(first(name), "_(number|slider)"))

group_by and keep all groups that does not not contain specific value and filter where there is value

If there could be multiple Yes values:

df %>%
group_by(Code) %>%
slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes"))

Code Inst
<chr> <chr>
1 a Yes
2 b No
3 b No
4 b No
5 b No
6 b No

Considering the updated question:

df %>%
mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
group_by(Code) %>%
slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes")) %>%
filter(Date == min(Date))

Code Inst Date
<chr> <chr> <date>
1 a Yes 2021-01-01
2 b No 2021-01-06
3 b No 2021-01-06
4 b No 2021-01-06

filter() (dplyr) does not distinguish between character and number?

In R, the expression 20 == "20" is valid, though some (from other programming languages) might consider that a little "sloppy". When that is evaluated, it up-classes the 20 to "20" for the comparison. This silent casting can be good (useful and flexible), but it can also cause unintended, undesired, and/or surprising results. (The fact that it's silent is what I dislike about it, but convenience is convenience.)

If you want to be perfectly clear about your comparison, you can test for class as well. In your example, you show 20 which is numeric and not technically integer (which would be 20L), but you can shape the precision of the conditional to your own tastes:

filter(data, is.numeric(depth_m) & depth_m == 20)

This will still up-class the 20 to "20", but because the first portion is.numeric(.) fails, the combination of the two will fail as well. Realize that the specificity of that test is absolute: if the column is indeed character, then you will always get zero rows, which may not be what you want. If instead you want to remove non-20 rows only if they are 20 and numeric, then perhaps

filter(data, !is.numeric(depth_m) | depth_m == 20)

This goes down the dizzying logic of "if it is not numeric, then it obviously cannot truly be 20, so keep it ... but if it is numeric, make sure it is definitely 20". Of course, we run into the premise here that there is no way that one portion of the column can be numeric while another cannot, so ... perhaps that's over-indulging the specificity of filtering.

dplyr filter statement not in expression from a data.frame

Actually I got the answer. Add an unlist to the week_e data.frame then it is solved

 df1 = data.frame(week=c(1,2,3,4,5,6),sales=c(10,24,23,54,65,45))
week_e=unlist(data.frame(week=c(2,5)))

df1 %>%
filter(!week %in% week_e)

week sales
1 10
3 23
4 54
6 45

dplyr filter keep the NAs AND OR conditions

Use | instead of &. With filter, multiple expressions separated by , are taken as &. It is not possible to have a value that is both NA and not equal to 182

library(dplyr)
test_data_1 %>%
filter(is.na(Art) | Art != 182)

-output

# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA

The second part of the question is with %in%. We can use | again

test_data_1 %>%
filter(Art %in% c(140,188) | is.na(Art))
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA

NOTE: By default, filter removes the NA elements. In addition, there is no na.rm argument in filter

Why won't filter() work for filtering out multiple rows that don't equal certain characters?

The error here is just in your logic: by allowing everything that isn't A and everything that isn't B, you ensure that all items are kept. This is because anything that's A isn't B, so it's kept, and anything that's B isn't A, so it's kept too. A better way to handle this would be to create a vector, say, exclusions, of all the values of nests_2020 (note that you switched from nests_2021 to nests_2020 in your question) you'd like to exclude, then use something like filter(szb4, !(nests_2020 %in% exclusions)).

Filter in a dplyr group only when the condition is met else do not

Actually, I found the answer in another related question.

This uses a data.table one liner which in my case was:

library(data.table)

test <- setDT(test)[, if(any(is.na(stamp_score))) .SD[is.na(stamp_score)] else .SD, .(hit, indx)]

Essentially, this code subsets the group only if there is a NA in the "stamp_score" column else it does not.

Thanks to everyone who tried to help and also helped me improve my question over time.



Related Topics



Leave a reply



Submit