subset rows with (1) ALL and (2) ANY columns larger than a specific value
See functions all()
and any()
for the first and second parts of your questions respectively. The apply()
function can be used to run functions over rows or columns. (MARGIN = 1
is rows, MARGIN = 2
is columns, etc). Note I use apply()
on df[, -1]
to ignore the id
variable when doing the comparisons.
Part 1:
> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11))
> df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ]
id v1 v2
2 2 15 32
4 4 12 17
Part 2:
> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ]
id v1 v2
2 2 15 32
4 4 12 17
5 5 7 11
To see what is going on, x > 10
returns a logical vector for each row (via apply()
indicating whether each element is greater than 10. all()
returns TRUE
if all element of the input vector are TRUE
and FALSE
otherwise. any()
returns TRUE
if any of the elements in the input is TRUE
and FALSE
if all are FALSE
.
I then use the logical vector resulting from the apply()
call
> apply(df[, -1], MARGIN = 1, function(x) all(x > 10))
[1] FALSE TRUE FALSE TRUE FALSE
> apply(df[, -1], MARGIN = 1, function(x) any(x > 10))
[1] FALSE TRUE FALSE TRUE TRUE
to subset df
(as shown above).
Subsetting Rows with a Column Value Greater than a Threshold
We can use rowSums
data[rowSums(data[5:70] > 7) > 0, ]
Or with subset
subset(data, rowSums(data[5:70] > 7) > 0)
We can also use filter_at
from dplyr
with any_vars
library(dplyr)
data %>% filter_at(vars(5:70), any_vars(. > 7))
Using reproducible data from mtcars
(stealing idea from @Maurits Evers)
mtcars[rowSums(mtcars[3:11] > 300) > 0, ]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
#Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
#AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
#Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
#Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
#Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Using filter_at
also gives the same output
mtcars %>% filter_at(vars(3:11), any_vars(. > 300))
R: Select only Rows where value greater than a certain value and Mapped to another column where value is Yes or No
Turns out it was pretty easy.
x = df[df$Answer == "Yes"]
x = df[df$Age >= 40]
x$Age
Select rows in DataFrame if value in any column is between two values
Here is another idea, separate the masks and use & to join:
import pandas as pd
df = pd.DataFrame({'AP-1': [30, 32, 34, 31, 33, 35, 36, 38, 37],
'AP-2': [30, 32, 34, 80, 33, 35, 36, 38, 37],
'AP-3': [30, 32, 81, 31, 33, 101, 36, 38, 37],
'AP-4': [30, 32, 34, 95, 33, 35, 103, 38, 121],
'AP-5': [30, 32, 34, 31, 33, 144, 36, 38, 37],
'AP-6': [30, 32, 34, 31, 33, 35, 36, 110, 37],
'AP-7': [30, 87, 34, 31, 111, 35, 36, 38, 122],
'AP-8': [30, 32, 99, 31, 33, 35, 36, 38, 37],
'AP-9': [30, 32, 34, 31, 33, 99, 88, 38, 37]},
index=['1', '2', '3', '4', '5', '6', '7', '8', '9'])
# This is the actual frame you want
df = df.transpose()
m1 = (df >= 80).any(1)
m2 = ~(df >= 100).any(1) #<-- Invert the statement with ~
df2 = df.loc[m1&m2]
print(df2)
Prints:
1 2 3 4 5 6 7 8 9
AP-2 30 32 34 80 33 35 36 38 37
AP-8 30 32 99 31 33 35 36 38 37
AP-9 30 32 34 31 33 99 88 38 37
filter rows when all columns greater than a value
We can create a logical matrix my comparing the entire data frame with 2 and then do rowSums
over it and select only those rows whose value is equal to number of columns in df
df[rowSums(df > 2) == ncol(df), ]
# A B C
#2 4 3 5
A dplyr
approach using filter_all
and all_vars
library(dplyr)
df %>% filter_all(all_vars(. > 2))
# A B C
#1 4 3 5
dplyr
> 1.0.0
#1. if_all
df %>% filter(if_all(.fns = ~. > 2))
#2. across
df %>% filter(across(.fns = ~. > 2))
An apply
approach
#Using apply
df[apply(df > 2, 1, all), ]
#Using lapply as shared by @thelatemail
df[Reduce(`&`, lapply(df, `>`, 2)),]
Subset dataframe on presence of certain value in any column
additional solution:
df <- data.frame(
v1 = c(1:10),
v2 = c(10:1),
v3 = c(1,3,2,9,5,6,1,2,3,9)
)
df[apply(df, 1, function(x) any(x == 9)), ]
#> v1 v2 v3
#> 2 2 9 3
#> 4 4 7 9
#> 9 9 2 3
#> 10 10 1 9
Created on 2021-02-25 by the reprex package (v1.0.0)
using tidyverse
library(tidyverse)
df %>%
rowwise() %>%
filter(any(c_across(everything()) == 9))
#> # A tibble: 4 x 3
#> # Rowwise:
#> v1 v2 v3
#> <int> <int> <dbl>
#> 1 2 9 3
#> 2 4 7 9
#> 3 9 2 3
#> 4 10 1 9
Created on 2021-02-25 by the reprex package (v1.0.0)
R keep rows with at least one column greater than value
You can use rowSums
to construct the condition in base R:
df[rowSums(df > 10) >= 1, ]
with dplyr
(0.7.0), now you can use filter_all
like this:
library(dplyr)
filter_all(df, any_vars(. > 10))
Extracting columns having greater than certain values in R dataframe
We could use colSums
to subset columns using base R
df[colSums(df > 0.6) > 0]
# Jux Gyno
#1 0.67 0.89
#2 0.11 0.65
#3 0.60 0.67
#4 0.09 0.01
Or with dplyr
, select_if
library(dplyr)
df %>% select_if(~any(. > 0.6))
How to select column values based on a greater than condition in row values
We can create a logical vector by comparing the dataframe with 3 and then take sum of columns using colSums
and select only those columns which has at least one value greater than 3 in it.
mtcars[colSums(mtcars > 3) > 0]
# mpg cyl disp hp drat wt qsec gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 3 1
#....
Variation using sapply
mtcars[sapply(mtcars, function(x) any(x > 3))]
Related Topics
Ggplot2: Issues with Dual Y-Axes and Loess Smoothing
How to Make a Heatmap with a Large Matrix
How to Use 'Facet' to Create Multiple Density Plot in Ggplot
Add New Variable to List of Data Frames with Purrr and Mutate() from Dplyr
Roll Your Own Linked List/Tree in R
How to Calculate Wind Direction from U and V Wind Components in R
Search Within a String That Does Not Contain a Pattern
How to Assign Output of Cat to an Object
Why Is This Naive Matrix Multiplication Faster Than Base R'S
Overlaying Two Graphs Using Ggplot2 in R
How to Create a Range of Dates in R
Side by Side Histograms in the Same Graph in R
Library/Package Development - Message When Loading
Add Text to Geom_Line in Ggplot