Subset Dataframe by Multiple Logical Conditions of Rows to Remove

Subset dataframe by multiple logical conditions of rows to remove

The ! should be around the outside of the statement:

data[!(data$v1 %in% c("b", "d", "e")), ]

  v1 v2 v3 v4
1  a  v  d  c
2  a  v  d  d
5  c  k  d  c
6  c  r  p  g

Remove rows in df using multiple conditions in R

You can use subset():

df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
                  month = rep(c("March", "October"), each = 1), 
                  site = rep(c("1", "2", "3", "4", "5"), each = 2),
                  common_name = rep(c("Tuna", "shark"), each = 1),
                  num = sample(x = 0:2, size  = 20, replace = TRUE))

subset(df1, !(site == "1" & year == 2019 & month == "March"))
#>    year   month site common_name num
#> 2  2019 October    1       shark   0
#> 3  2019   March    2        Tuna   1
#> 4  2019 October    2       shark   0
#> 5  2019   March    3        Tuna   0
#> 6  2019 October    3       shark   0
#> 7  2019   March    4        Tuna   2
#> 8  2019 October    4       shark   2
#> 9  2019   March    5        Tuna   0
#> 10 2019 October    5       shark   2
#> 11 2020   March    1        Tuna   1
#> 12 2020 October    1       shark   1
#> 13 2020   March    2        Tuna   2
#> 14 2020 October    2       shark   2
#> 15 2020   March    3        Tuna   1
#> 16 2020 October    3       shark   0
#> 17 2020   March    4        Tuna   1
#> 18 2020 October    4       shark   0
#> 19 2020   March    5        Tuna   0
#> 20 2020 October    5       shark   2

^{Created on 2022-05-31 by the reprex package (v2.0.1)}

Removing rows from a data set with respect to multiple conditions in R

Try

newdata<-subset(df, comorbidity1 !=10 | date_of_comorbidity >= date_of_birth)

R: remove a subset of a dataframe from the original one with multiple conditions using for loop

Using dplyr :

library(dplyr)

df %>%
  group_by(Depo) %>%
  filter((sum(Sales) > max(Sales)) & (sum(Sales == 0) < (0.8 * n())))
  #Opposite can be written as : 
  #filter(!((sum(Sales) <= max(Sales)) | (sum(Sales == 0) > (0.8 * n()))))

The same logic can also be implemented in base R :

subset(df, as.logical(ave(Sales, Depo, FUN = function(x) 
            (sum(x) > max(x)) & (sum(x == 0) < (0.8 * length(x))))))

and data.table :

library(data.table)

setDT(df)[, .SD[(sum(Sales) > max(Sales)) & (sum(Sales == 0) < (0.8 * .N))], Depo]

data

df <- structure(list(Date = c("2020-01", "2020-02", "2020-03", "2020-04", 
"2020-01", "2020-02", "2020-03", "2020-04"), Sales = c(100L, 
125L, 0L, 0L, 0L, 0L, 0L, 5L), Depo = c("ABC", "ABC", "ABC", 
"ABC", "BBC", "BBC", "BBC", "BBC")), class = "data.frame", row.names =c(NA, -8L))

Subset data frame based on multiple conditions

Logic index:

d<-d[!(d$A=="B" & d$E==0),]

How to remove rows based multiple conditions

You can remove rows where 'Death' occurs on row number 1 in each group.

library(dplyr)

df %>%
  group_by(id) %>%
  filter(!(row_number() == 1 & ConditionII == 'Death'))

#  id    ConditionI ConditionII
#  <chr> <chr>      <chr>      
#1 B     2018-01-01 Alive      
#2 B     2018-01-15 Alive      
#3 B     2018-01-20 Death      
#4 C     2018-02-01 Alive      
#5 C     2018-02-1  Alive      
#6 E     2018-04-01 Alive      
#7 E     2018-04-10 Death

Same logic using data.table :

library(data.table)
setDT(df)[, .SD[!(seq_len(.N) == 1 & ConditionII == 'Death')], id]

How to combine multiple conditions to subset a data-frame using OR?

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

Remove rows by multiple logical conditions (rstudio)

You can try subset + rowSums like below

subset(df,!rowSums(df > 30)>=8)

Subset Dataframe by Multiple Logical Conditions of Rows to Remove