Subset Dataframe by Multiple Logical Conditions of Rows to Remove

Subset dataframe by multiple logical conditions of rows to remove

The ! should be around the outside of the statement:

data[!(data$v1 %in% c("b", "d", "e")), ]

v1 v2 v3 v4
1 a v d c
2 a v d d
5 c k d c
6 c r p g

Remove rows in df using multiple conditions in R

You can use subset():

df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
month = rep(c("March", "October"), each = 1),
site = rep(c("1", "2", "3", "4", "5"), each = 2),
common_name = rep(c("Tuna", "shark"), each = 1),
num = sample(x = 0:2, size = 20, replace = TRUE))

subset(df1, !(site == "1" & year == 2019 & month == "March"))
#> year month site common_name num
#> 2 2019 October 1 shark 0
#> 3 2019 March 2 Tuna 1
#> 4 2019 October 2 shark 0
#> 5 2019 March 3 Tuna 0
#> 6 2019 October 3 shark 0
#> 7 2019 March 4 Tuna 2
#> 8 2019 October 4 shark 2
#> 9 2019 March 5 Tuna 0
#> 10 2019 October 5 shark 2
#> 11 2020 March 1 Tuna 1
#> 12 2020 October 1 shark 1
#> 13 2020 March 2 Tuna 2
#> 14 2020 October 2 shark 2
#> 15 2020 March 3 Tuna 1
#> 16 2020 October 3 shark 0
#> 17 2020 March 4 Tuna 1
#> 18 2020 October 4 shark 0
#> 19 2020 March 5 Tuna 0
#> 20 2020 October 5 shark 2

Created on 2022-05-31 by the reprex package (v2.0.1)

Removing rows from a data set with respect to multiple conditions in R

Try

newdata<-subset(df, comorbidity1 !=10 | date_of_comorbidity >= date_of_birth)

R: remove a subset of a dataframe from the original one with multiple conditions using for loop

Using dplyr :

library(dplyr)

df %>%
group_by(Depo) %>%
filter((sum(Sales) > max(Sales)) & (sum(Sales == 0) < (0.8 * n())))
#Opposite can be written as :
#filter(!((sum(Sales) <= max(Sales)) | (sum(Sales == 0) > (0.8 * n()))))

The same logic can also be implemented in base R :

subset(df, as.logical(ave(Sales, Depo, FUN = function(x) 
(sum(x) > max(x)) & (sum(x == 0) < (0.8 * length(x))))))

and data.table :

library(data.table)

setDT(df)[, .SD[(sum(Sales) > max(Sales)) & (sum(Sales == 0) < (0.8 * .N))], Depo]

data

df <- structure(list(Date = c("2020-01", "2020-02", "2020-03", "2020-04", 
"2020-01", "2020-02", "2020-03", "2020-04"), Sales = c(100L,
125L, 0L, 0L, 0L, 0L, 0L, 5L), Depo = c("ABC", "ABC", "ABC",
"ABC", "BBC", "BBC", "BBC", "BBC")), class = "data.frame", row.names =c(NA, -8L))

Subset data frame based on multiple conditions

Logic index:

d<-d[!(d$A=="B" & d$E==0),]

How to remove rows based multiple conditions

You can remove rows where 'Death' occurs on row number 1 in each group.

library(dplyr)

df %>%
group_by(id) %>%
filter(!(row_number() == 1 & ConditionII == 'Death'))

# id ConditionI ConditionII
# <chr> <chr> <chr>
#1 B 2018-01-01 Alive
#2 B 2018-01-15 Alive
#3 B 2018-01-20 Death
#4 C 2018-02-01 Alive
#5 C 2018-02-1 Alive
#6 E 2018-04-01 Alive
#7 E 2018-04-10 Death

Same logic using data.table :

library(data.table)
setDT(df)[, .SD[!(seq_len(.N) == 1 & ConditionII == 'Death')], id]

How to combine multiple conditions to subset a data-frame using OR?

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

Remove rows by multiple logical conditions (rstudio)

You can try subset + rowSums like below

subset(df,!rowSums(df > 30)>=8)


Related Topics



Leave a reply



Submit