How to combine multiple conditions to subset a data-frame using OR?
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
Order of arguments may matter when using '&".
How to subset dataframe based on multiple conditions?
You should do:
reshape2::recast(df ,Country + variable ~ Indicator)
Subset dataframe into multiple based on multiple conditions
Remove the numbers from MS
column and use it in split
to split one dataframe into list of dataframes based on the pattern.
result <- split(D_MtC, sub('\\d+', '', D_MtC$MS))
where output from sub
is :
sub('\\d+', '', D_MtC$MS)
#[1] "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl" "bl"
# "bl" "bu" "bu" "bu" "bu" "bu" "bu" "bu" "bu"
Subset a Data Frame based on Multiple Conditions
Here is a base R solution.grep
tells the "Root"
columns from the "Shoot"
ones. Then apply
loops returning logical (row) indices and which
take care of sub-setting the data.frame.
Root_R1 = c("Root",1,2,3,4,5)
Root_R2 = c("Root",1,0,3,0,0)
Root_R3 = c("Root",1,0,3,0,0)
Shoot_R1 = c("Shoot",1,0,3,4,5)
Shoot_R2 = c("Shoot",0,0,31,4,5)
Shoot_R3 = c("Shoot",0,0,0,0,0)
df1 <- data.frame(Root_R1, Root_R2, Root_R3, Shoot_R1, Shoot_R2, Shoot_R3)
df1 <- df1[-1,]
df1[] <- lapply(df1, as.integer)
root <- grep("Root", names(df1))
shoot <- grep("Shoot", names(df1))
ok_root <- which(apply(df1[root], 1, \(x) sum(x > 0L) >= 2L))
ok_shoot <- which(apply(df1[shoot], 1, \(x) sum(x > 0L) >= 2L))
df1[ok_root, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
df1[ok_shoot, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-09 by the reprex package (v2.0.1)
Edit
Following a question in comments
And assuming I want to change the numerical value of the cut-off, I would change this part of the code (
sum(x > 0L)
) and if I wanted to change the number of rows that meet the cut-off, I would change this:>= 2L
?
here is a function to solve the problem.
special_subset <- function(x, colpattern, cutoff = 0L, numrows = 2L) {
i_cols <- grep(colpattern, names(x))
ok <- which(apply(x[i_cols], 1, \(y) sum(y > cutoff) >= numrows))
x[ok, ]
}
special_subset(df1, "Root")
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
special_subset(df1, "Shoot", cutoff = 1)
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-09 by the reprex package (v2.0.1)
Edit 2
To pass more than one colpattern
to the function above, use a lapply
loop.
In the two exaples below, first I use the new pipe operator introduced in R 4.2.0 and in the second a standard lapply
.
tissue_type <- c("Root", "Shoot")
tissue_type |>
lapply(\(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
lapply(tissue_type, \(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-17 by the reprex package (v2.0.1)
Subsetting based on multiple conditions in R
You can use &
(and) to combine multiple conditions.
Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]
subset a dataframe on multiple conditions in R
Here is a tidyverse
solution. Separate into two data frames and bind the rows together.
library(tidyverse)
bind_rows(
dat1 %>% select(patientId, ends_with("1")) %>% rename_all(str_remove, "1"),
dat1 %>% select(patientId, ends_with("2")) %>% rename_all(str_remove, "2")
) %>%
transmute(
patientId,
Hist = H,
Sip = S,
date = paste0(Month, "-", Year)
) %>%
filter(
Sip %in% 16:18,
Hist %in% 80:81
)
#> # A tibble: 6 x 4
#> patientId Hist Sip date
#> <int> <dbl> <dbl> <chr>
#> 1 1 81 16 09-2017
#> 2 2 80 17 08-2017
#> 3 5 80 17 08-2016
#> 4 6 81 18 05-2017
#> 5 3 81 16 05-2016
#> 6 5 80 16 05-2016
Conditional subsetting from dataframe with multiple conditions
If the OP wanted to treat the 'group' adjacent unique
library(dplyr)
library(data.table)
dat %>%
group_by(grp = rleid(group)) %>%
filter(all(2017:2019 %in% year), group == 0) %>%
ungroup %>%
select(-grp)
# A tibble: 3 x 2
# group year
# <dbl> <int>
#1 0 2017
#2 0 2018
#3 0 2019
Or in base R
with rle
grp <- with(rle(dat$group), rep(seq_along(values), lengths))
subset(dat, as.logical(ave(year, grp, FUN =
function(x) all(2017:2019 %in% x)) ) & group == 0)
# group year
#4 0 2017
#5 0 2018
#6 0 2019
Related Topics
Starting Shiny App After Password Input
How to Use an Image as a Point in Ggplot
R Spreading Multiple Columns With Tidyr
How to Number/Label Data-Table by Group-Number from Group_By
Why Is Rbindlist "Better" Than Rbind
All Levels of a Factor in a Model Matrix in R
Gradient of N Colors Ranging from Color 1 and Color 2
How to Listen For More Than One Event Expression Within a Shiny Eventreactive Handler
How to Suppress Warnings Globally in an R Script
How to Remove All Whitespace from a String
How R Formats Posixct With Fractional Seconds
Combine Two or More Columns in a Dataframe into a New Column With a New Name
Multiple Plots in For Loop Ignoring Par
Yaml Current Date in Rmarkdown
R Conditional Evaluation When Using the Pipe Operator %≫%
Use a Value from the Previous Row in an R Data.Table Calculation
Error: '\R' Is an Unrecognized Escape in Character String Starting "C:\R"