How to combine multiple conditions to subset a data-frame using OR?
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
Order of arguments may matter when using '&".
Subset dataframe into multiple based on multiple conditions
Remove the numbers from MS
column and use it in split
to split one dataframe into list of dataframes based on the pattern.
result <- split(D_MtC, sub('\\d+', '', D_MtC$MS))
where output from sub
is :
sub('\\d+', '', D_MtC$MS)
#[1] "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl"
# "bl" "bu" "bu" "bu" "bu" "bu" "bu" "bu" "bu"
Subsetting based on multiple conditions in R
You can use &
(and) to combine multiple conditions.
Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]
pandas: multiple conditions while indexing data frame - unexpected behavior
As you can see, the AND operator drops every row in which at least one
value equals -1. On the other hand, the OR operator requires both
values to be equal to -1 to drop them.
That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1
:
df1 = df[(df.a != -1) & (df.b != -1)]
You're saying "keep the rows in which df.a
isn't -1 and df.b
isn't -1", which is the same as dropping every row in which at least one value is -1.
For df2
:
df2 = df[(df.a != -1) | (df.b != -1)]
You're saying "keep the rows in which either df.a
or df.b
is not -1", which is the same as dropping rows where both values are -1.
PS: chained access like df['a'][1] = -1
can get you into trouble. It's better to get into the habit of using .loc
and .iloc
.
How to subset dataframe based on multiple conditions?
You should do:
reshape2::recast(df ,Country + variable ~ Indicator)
subset a dataframe on multiple conditions in R
Here is a tidyverse
solution. Separate into two data frames and bind the rows together.
library(tidyverse)
bind_rows(
dat1 %>% select(patientId, ends_with("1")) %>% rename_all(str_remove, "1"),
dat1 %>% select(patientId, ends_with("2")) %>% rename_all(str_remove, "2")
) %>%
transmute(
patientId,
Hist = H,
Sip = S,
date = paste0(Month, "-", Year)
) %>%
filter(
Sip %in% 16:18,
Hist %in% 80:81
)
#> # A tibble: 6 x 4
#> patientId Hist Sip date
#> <int> <dbl> <dbl> <chr>
#> 1 1 81 16 09-2017
#> 2 2 80 17 08-2017
#> 3 5 80 17 08-2016
#> 4 6 81 18 05-2017
#> 5 3 81 16 05-2016
#> 6 5 80 16 05-2016
Subset a Data Frame based on Multiple Conditions
Here is a base R solution.grep
tells the "Root"
columns from the "Shoot"
ones. Then apply
loops returning logical (row) indices and which
take care of sub-setting the data.frame.
Root_R1 = c("Root",1,2,3,4,5)
Root_R2 = c("Root",1,0,3,0,0)
Root_R3 = c("Root",1,0,3,0,0)
Shoot_R1 = c("Shoot",1,0,3,4,5)
Shoot_R2 = c("Shoot",0,0,31,4,5)
Shoot_R3 = c("Shoot",0,0,0,0,0)
df1 <- data.frame(Root_R1, Root_R2, Root_R3, Shoot_R1, Shoot_R2, Shoot_R3)
df1 <- df1[-1,]
df1[] <- lapply(df1, as.integer)
root <- grep("Root", names(df1))
shoot <- grep("Shoot", names(df1))
ok_root <- which(apply(df1[root], 1, \(x) sum(x > 0L) >= 2L))
ok_shoot <- which(apply(df1[shoot], 1, \(x) sum(x > 0L) >= 2L))
df1[ok_root, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
df1[ok_shoot, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-09 by the reprex package (v2.0.1)
Edit
Following a question in comments
And assuming I want to change the numerical value of the cut-off, I would change this part of the code (
sum(x > 0L)
) and if I wanted to change the number of rows that meet the cut-off, I would change this:>= 2L
?
here is a function to solve the problem.
special_subset <- function(x, colpattern, cutoff = 0L, numrows = 2L) {
i_cols <- grep(colpattern, names(x))
ok <- which(apply(x[i_cols], 1, \(y) sum(y > cutoff) >= numrows))
x[ok, ]
}
special_subset(df1, "Root")
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
special_subset(df1, "Shoot", cutoff = 1)
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-09 by the reprex package (v2.0.1)
Edit 2
To pass more than one colpattern
to the function above, use a lapply
loop.
In the two exaples below, first I use the new pipe operator introduced in R 4.2.0 and in the second a standard lapply
.
tissue_type <- c("Root", "Shoot")
tissue_type |>
lapply(\(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
lapply(tissue_type, \(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-17 by the reprex package (v2.0.1)
Subset dataframe by multiple logical conditions of rows to remove
The !
should be around the outside of the statement:
data[!(data$v1 %in% c("b", "d", "e")), ]
v1 v2 v3 v4
1 a v d c
2 a v d d
5 c k d c
6 c r p g
Conditional subsetting from dataframe with multiple conditions
If the OP wanted to treat the 'group' adjacent unique
library(dplyr)
library(data.table)
dat %>%
group_by(grp = rleid(group)) %>%
filter(all(2017:2019 %in% year), group == 0) %>%
ungroup %>%
select(-grp)
# A tibble: 3 x 2
# group year
# <dbl> <int>
#1 0 2017
#2 0 2018
#3 0 2019
Or in base R
with rle
grp <- with(rle(dat$group), rep(seq_along(values), lengths))
subset(dat, as.logical(ave(year, grp, FUN =
function(x) all(2017:2019 %in% x)) ) & group == 0)
# group year
#4 0 2017
#5 0 2018
#6 0 2019
Related Topics
Text Color Based on Contrast Against Background
How to Add Abline with Lattice Xyplot Function
Disabling/Enabling Sidebar from Server Side
R - Reading Lines from a .Txt-File After a Specific Line
How to Include Custom CSS in HTMLwidgets for R And/Or Leafletr
How to Create a Hyperlink Interactively in Shiny App
Changing Multiple Column Values Given a Condition in Dplyr
R/Gis: How to Subset a Shapefile by a Lat-Long Bounding Box
Trouble Installing and Loading Rjava on MAC El Capitan
Transpose Only Certain Columns in Data.Frame
Rsqlite Query with User Specified Variable in the Where Field
Flatten Nested List into 1-Deep List
R Leaflet - Use Date or Character Legend Labels with Colornumeric() Palette
Looping Through Covariates in Regression Using R
Data.Table - Left Outer Join on Multiple Tables