Subset dataframe by multiple logical conditions of rows to remove
The !
should be around the outside of the statement:
data[!(data$v1 %in% c("b", "d", "e")), ]
v1 v2 v3 v4
1 a v d c
2 a v d d
5 c k d c
6 c r p g
Remove rows in df using multiple conditions in R
You can use subset()
:
df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
month = rep(c("March", "October"), each = 1),
site = rep(c("1", "2", "3", "4", "5"), each = 2),
common_name = rep(c("Tuna", "shark"), each = 1),
num = sample(x = 0:2, size = 20, replace = TRUE))
subset(df1, !(site == "1" & year == 2019 & month == "March"))
#> year month site common_name num
#> 2 2019 October 1 shark 0
#> 3 2019 March 2 Tuna 1
#> 4 2019 October 2 shark 0
#> 5 2019 March 3 Tuna 0
#> 6 2019 October 3 shark 0
#> 7 2019 March 4 Tuna 2
#> 8 2019 October 4 shark 2
#> 9 2019 March 5 Tuna 0
#> 10 2019 October 5 shark 2
#> 11 2020 March 1 Tuna 1
#> 12 2020 October 1 shark 1
#> 13 2020 March 2 Tuna 2
#> 14 2020 October 2 shark 2
#> 15 2020 March 3 Tuna 1
#> 16 2020 October 3 shark 0
#> 17 2020 March 4 Tuna 1
#> 18 2020 October 4 shark 0
#> 19 2020 March 5 Tuna 0
#> 20 2020 October 5 shark 2
Created on 2022-05-31 by the reprex package (v2.0.1)
Removing rows from a data set with respect to multiple conditions in R
Try
newdata<-subset(df, comorbidity1 !=10 | date_of_comorbidity >= date_of_birth)
R: remove a subset of a dataframe from the original one with multiple conditions using for loop
Using dplyr
:
library(dplyr)
df %>%
group_by(Depo) %>%
filter((sum(Sales) > max(Sales)) & (sum(Sales == 0) < (0.8 * n())))
#Opposite can be written as :
#filter(!((sum(Sales) <= max(Sales)) | (sum(Sales == 0) > (0.8 * n()))))
The same logic can also be implemented in base R :
subset(df, as.logical(ave(Sales, Depo, FUN = function(x)
(sum(x) > max(x)) & (sum(x == 0) < (0.8 * length(x))))))
and data.table
:
library(data.table)
setDT(df)[, .SD[(sum(Sales) > max(Sales)) & (sum(Sales == 0) < (0.8 * .N))], Depo]
data
df <- structure(list(Date = c("2020-01", "2020-02", "2020-03", "2020-04",
"2020-01", "2020-02", "2020-03", "2020-04"), Sales = c(100L,
125L, 0L, 0L, 0L, 0L, 0L, 5L), Depo = c("ABC", "ABC", "ABC",
"ABC", "BBC", "BBC", "BBC", "BBC")), class = "data.frame", row.names =c(NA, -8L))
Subset data frame based on multiple conditions
Logic index:
d<-d[!(d$A=="B" & d$E==0),]
How to remove rows based multiple conditions
You can remove rows where 'Death'
occurs on row number 1 in each group.
library(dplyr)
df %>%
group_by(id) %>%
filter(!(row_number() == 1 & ConditionII == 'Death'))
# id ConditionI ConditionII
# <chr> <chr> <chr>
#1 B 2018-01-01 Alive
#2 B 2018-01-15 Alive
#3 B 2018-01-20 Death
#4 C 2018-02-01 Alive
#5 C 2018-02-1 Alive
#6 E 2018-04-01 Alive
#7 E 2018-04-10 Death
Same logic using data.table
:
library(data.table)
setDT(df)[, .SD[!(seq_len(.N) == 1 & ConditionII == 'Death')], id]
How to combine multiple conditions to subset a data-frame using OR?
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
Order of arguments may matter when using '&".
Remove rows by multiple logical conditions (rstudio)
You can try subset
+ rowSums
like below
subset(df,!rowSums(df > 30)>=8)
Related Topics
Capitalize the First Letter of Both Words in a Two Word String
Ggplot, Facet, Piechart: Placing Text in the Middle of Pie Chart Slices
Unlist Data Frame Column Preserving Information from Other Column
Splitting a Continuous Variable into Equal Sized Groups
Count the Number of All Words in a String
How to Efficiently Calculate Distance Between Pair of Coordinates Using Data.Table :=
Rcpp Pass by Reference Vs. by Value
Rep() With Each Equals a Vector
Add a Variable to a Data Frame Containing Max Value of Each Row
Create New Dummy Variable Columns from Categorical Variable
How to Match Fuzzy Match Strings from Two Datasets
Calculate the Mean For Each Column of a Matrix in R
Efficient Way to Rbind Data.Frames With Different Columns
Basic Lag in R Vector/Dataframe
Usage of '...' (Three-Dots or Dot-Dot-Dot) in Functions
R Ifelse to Replace Values in a Column
Cumulatively Paste (Concatenate) Values Grouped by Another Variable