Subset Data Frame Based on Multiple Conditions

How to combine multiple conditions to subset a data-frame using OR?

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

How to subset dataframe based on multiple conditions?

You should do:

reshape2::recast(df ,Country + variable ~ Indicator)

Subset dataframe into multiple based on multiple conditions

Remove the numbers from MS column and use it in split to split one dataframe into list of dataframes based on the pattern.

result <- split(D_MtC, sub('\\d+', '', D_MtC$MS))

where output from sub is :

sub('\\d+', '', D_MtC$MS)

#[1] "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl"
# "bl" "bu" "bu" "bu" "bu" "bu" "bu" "bu" "bu"

Subset a Data Frame based on Multiple Conditions

Here is a base R solution.
grep tells the "Root" columns from the "Shoot" ones. Then apply loops returning logical (row) indices and which take care of sub-setting the data.frame.

Root_R1 = c("Root",1,2,3,4,5)
Root_R2 = c("Root",1,0,3,0,0)
Root_R3 = c("Root",1,0,3,0,0)
Shoot_R1 = c("Shoot",1,0,3,4,5)
Shoot_R2 = c("Shoot",0,0,31,4,5)
Shoot_R3 = c("Shoot",0,0,0,0,0)
df1 <- data.frame(Root_R1, Root_R2, Root_R3, Shoot_R1, Shoot_R2, Shoot_R3)

df1 <- df1[-1,]
df1[] <- lapply(df1, as.integer)

root <- grep("Root", names(df1))
shoot <- grep("Shoot", names(df1))
ok_root <- which(apply(df1[root], 1, \(x) sum(x > 0L) >= 2L))
ok_shoot <- which(apply(df1[shoot], 1, \(x) sum(x > 0L) >= 2L))

df1[ok_root, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
df1[ok_shoot, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0

Created on 2022-06-09 by the reprex package (v2.0.1)



Edit

Following a question in comments

And assuming I want to change the numerical value of the cut-off, I would change this part of the code (sum(x > 0L)) and if I wanted to change the number of rows that meet the cut-off, I would change this: >= 2L?

here is a function to solve the problem.

special_subset <- function(x, colpattern, cutoff = 0L, numrows = 2L) {
i_cols <- grep(colpattern, names(x))
ok <- which(apply(x[i_cols], 1, \(y) sum(y > cutoff) >= numrows))
x[ok, ]
}

special_subset(df1, "Root")
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0

special_subset(df1, "Shoot", cutoff = 1)
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0

Created on 2022-06-09 by the reprex package (v2.0.1)



Edit 2

To pass more than one colpattern to the function above, use a lapply loop.

In the two exaples below, first I use the new pipe operator introduced in R 4.2.0 and in the second a standard lapply.

tissue_type <- c("Root", "Shoot")

tissue_type |>
lapply(\(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0

lapply(tissue_type, \(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0

Created on 2022-06-17 by the reprex package (v2.0.1)

Subsetting based on multiple conditions in R

You can use & (and) to combine multiple conditions.

Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]

subset a dataframe on multiple conditions in R

Here is a tidyverse solution. Separate into two data frames and bind the rows together.

library(tidyverse)

bind_rows(
dat1 %>% select(patientId, ends_with("1")) %>% rename_all(str_remove, "1"),
dat1 %>% select(patientId, ends_with("2")) %>% rename_all(str_remove, "2")
) %>%
transmute(
patientId,
Hist = H,
Sip = S,
date = paste0(Month, "-", Year)
) %>%
filter(
Sip %in% 16:18,
Hist %in% 80:81
)
#> # A tibble: 6 x 4
#> patientId Hist Sip date
#> <int> <dbl> <dbl> <chr>
#> 1 1 81 16 09-2017
#> 2 2 80 17 08-2017
#> 3 5 80 17 08-2016
#> 4 6 81 18 05-2017
#> 5 3 81 16 05-2016
#> 6 5 80 16 05-2016

Conditional subsetting from dataframe with multiple conditions

If the OP wanted to treat the 'group' adjacent unique

library(dplyr)
library(data.table)
dat %>%
group_by(grp = rleid(group)) %>%
filter(all(2017:2019 %in% year), group == 0) %>%
ungroup %>%
select(-grp)
# A tibble: 3 x 2
# group year
# <dbl> <int>
#1 0 2017
#2 0 2018
#3 0 2019

Or in base R with rle

grp <- with(rle(dat$group), rep(seq_along(values), lengths))
subset(dat, as.logical(ave(year, grp, FUN =
function(x) all(2017:2019 %in% x)) ) & group == 0)
# group year
#4 0 2017
#5 0 2018
#6 0 2019


Related Topics



Leave a reply



Submit