Subset a Data.Frame with Multiple Conditions

How to combine multiple conditions to subset a data-frame using OR?

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

Subset dataframe into multiple based on multiple conditions

Remove the numbers from MS column and use it in split to split one dataframe into list of dataframes based on the pattern.

result <- split(D_MtC, sub('\\d+', '', D_MtC$MS))

where output from sub is :

sub('\\d+', '', D_MtC$MS)

#[1] "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" 
#     "bl" "bu" "bu" "bu" "bu" "bu" "bu" "bu" "bu"

Subsetting based on multiple conditions in R

You can use & (and) to combine multiple conditions.

Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]

pandas: multiple conditions while indexing data frame - unexpected behavior

As you can see, the AND operator drops every row in which at least one
value equals -1. On the other hand, the OR operator requires both
values to be equal to -1 to drop them.

That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:

df1 = df[(df.a != -1) & (df.b != -1)]

You're saying "keep the rows in which df.a isn't -1 and df.b isn't -1", which is the same as dropping every row in which at least one value is -1.

For df2:

df2 = df[(df.a != -1) | (df.b != -1)]

You're saying "keep the rows in which either df.a or df.b is not -1", which is the same as dropping rows where both values are -1.

PS: chained access like df['a'][1] = -1 can get you into trouble. It's better to get into the habit of using .loc and .iloc.

How to subset dataframe based on multiple conditions?

You should do:

reshape2::recast(df ,Country + variable ~ Indicator)

subset a dataframe on multiple conditions in R

Here is a tidyverse solution. Separate into two data frames and bind the rows together.

library(tidyverse)
  
bind_rows(
  dat1 %>% select(patientId, ends_with("1")) %>% rename_all(str_remove, "1"),
  dat1 %>% select(patientId, ends_with("2")) %>% rename_all(str_remove, "2")
) %>%
  transmute(
    patientId,
    Hist = H,
    Sip = S,
    date = paste0(Month, "-", Year)
  ) %>%
  filter(
    Sip %in% 16:18,
    Hist %in% 80:81
  )
#> # A tibble: 6 x 4
#>    patientId  Hist   Sip date   
#>        <int> <dbl> <dbl> <chr>  
#> 1          1    81    16 09-2017
#> 2          2    80    17 08-2017
#> 3          5    80    17 08-2016
#> 4          6    81    18 05-2017
#> 5          3    81    16 05-2016
#> 6          5    80    16 05-2016

Subset a Data Frame based on Multiple Conditions

Here is a base R solution.
grep tells the "Root" columns from the "Shoot" ones. Then apply loops returning logical (row) indices and which take care of sub-setting the data.frame.

Root_R1 = c("Root",1,2,3,4,5)
Root_R2 = c("Root",1,0,3,0,0)
Root_R3 = c("Root",1,0,3,0,0)
Shoot_R1 = c("Shoot",1,0,3,4,5)
Shoot_R2 = c("Shoot",0,0,31,4,5)
Shoot_R3 = c("Shoot",0,0,0,0,0)
df1 <- data.frame(Root_R1, Root_R2, Root_R3, Shoot_R1, Shoot_R2, Shoot_R3)

df1 <- df1[-1,]
df1[] <- lapply(df1, as.integer)

root <- grep("Root", names(df1))
shoot <- grep("Shoot", names(df1))
ok_root <- which(apply(df1[root], 1, \(x) sum(x > 0L) >= 2L))
ok_shoot <- which(apply(df1[shoot], 1, \(x) sum(x > 0L) >= 2L))

df1[ok_root, ]
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2       1       1       1        1        0        0
#> 4       3       3       3        3       31        0
df1[ok_shoot, ]
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4       3       3       3        3       31        0
#> 5       4       0       0        4        4        0
#> 6       5       0       0        5        5        0

^{Created on 2022-06-09 by the reprex package (v2.0.1)}

Edit

Following a question in comments

And assuming I want to change the numerical value of the cut-off, I would change this part of the code (sum(x > 0L)) and if I wanted to change the number of rows that meet the cut-off, I would change this: >= 2L?

here is a function to solve the problem.

special_subset <- function(x, colpattern, cutoff = 0L, numrows = 2L) {
  i_cols <- grep(colpattern, names(x))
  ok <- which(apply(x[i_cols], 1, \(y) sum(y > cutoff) >= numrows))
  x[ok, ]
}

special_subset(df1, "Root")
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2       1       1       1        1        0        0
#> 4       3       3       3        3       31        0

special_subset(df1, "Shoot", cutoff = 1)
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4       3       3       3        3       31        0
#> 5       4       0       0        4        4        0
#> 6       5       0       0        5        5        0

^{Created on 2022-06-09 by the reprex package (v2.0.1)}

Edit 2

To pass more than one colpattern to the function above, use a lapply loop.

In the two exaples below, first I use the new pipe operator introduced in R 4.2.0 and in the second a standard lapply.

tissue_type <- c("Root", "Shoot")

tissue_type |>
  lapply(\(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2       1       1       1        1        0        0
#> 4       3       3       3        3       31        0
#> 
#> [[2]]
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4       3       3       3        3       31        0
#> 5       4       0       0        4        4        0
#> 6       5       0       0        5        5        0

lapply(tissue_type, \(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2       1       1       1        1        0        0
#> 4       3       3       3        3       31        0
#> 
#> [[2]]
#>   Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4       3       3       3        3       31        0
#> 5       4       0       0        4        4        0
#> 6       5       0       0        5        5        0

^{Created on 2022-06-17 by the reprex package (v2.0.1)}

Subset dataframe by multiple logical conditions of rows to remove

The ! should be around the outside of the statement:

data[!(data$v1 %in% c("b", "d", "e")), ]

  v1 v2 v3 v4
1  a  v  d  c
2  a  v  d  d
5  c  k  d  c
6  c  r  p  g

Conditional subsetting from dataframe with multiple conditions

If the OP wanted to treat the 'group' adjacent unique

library(dplyr)
library(data.table)
dat %>%
   group_by(grp = rleid(group)) %>%
   filter(all(2017:2019 %in% year), group == 0) %>%
   ungroup %>%
   select(-grp)
# A tibble: 3 x 2
#  group  year
#  <dbl> <int>
#1     0  2017
#2     0  2018
#3     0  2019

Or in base R with rle

grp <- with(rle(dat$group), rep(seq_along(values), lengths))
subset(dat, as.logical(ave(year,  grp, FUN = 
    function(x) all(2017:2019 %in% x)) ) & group == 0)
#  group year
#4     0 2017
#5     0 2018
#6     0 2019