Subset Rows with (1) All and (2) Any Columns Larger Than a Specific Value

subset rows with (1) ALL and (2) ANY columns larger than a specific value

See functions all() and any() for the first and second parts of your questions respectively. The apply() function can be used to run functions over rows or columns. (MARGIN = 1 is rows, MARGIN = 2 is columns, etc). Note I use apply() on df[, -1] to ignore the id variable when doing the comparisons.

Part 1:

> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11))
> df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ]
id v1 v2
2 2 15 32
4 4 12 17

Part 2:

> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ]
id v1 v2
2 2 15 32
4 4 12 17
5 5 7 11

To see what is going on, x > 10 returns a logical vector for each row (via apply() indicating whether each element is greater than 10. all() returns TRUE if all element of the input vector are TRUE and FALSE otherwise. any() returns TRUE if any of the elements in the input is TRUE and FALSE if all are FALSE.

I then use the logical vector resulting from the apply() call

> apply(df[, -1], MARGIN = 1, function(x) all(x > 10))
[1] FALSE TRUE FALSE TRUE FALSE
> apply(df[, -1], MARGIN = 1, function(x) any(x > 10))
[1] FALSE TRUE FALSE TRUE TRUE

to subset df (as shown above).

Subsetting Rows with a Column Value Greater than a Threshold

We can use rowSums

data[rowSums(data[5:70] > 7) > 0, ]

Or with subset

subset(data, rowSums(data[5:70] > 7) > 0)

We can also use filter_at from dplyr with any_vars

library(dplyr)
data %>% filter_at(vars(5:70), any_vars(. > 7))

Using reproducible data from mtcars (stealing idea from @Maurits Evers)

mtcars[rowSums(mtcars[3:11] > 300) > 0, ]

# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
#Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
#AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
#Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
#Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
#Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8

Using filter_at also gives the same output

mtcars %>% filter_at(vars(3:11), any_vars(. > 300))

R: Select only Rows where value greater than a certain value and Mapped to another column where value is Yes or No

Turns out it was pretty easy.

x = df[df$Answer == "Yes"]
x = df[df$Age >= 40]
x$Age

Select rows in DataFrame if value in any column is between two values

Here is another idea, separate the masks and use & to join:

import pandas as pd

df = pd.DataFrame({'AP-1': [30, 32, 34, 31, 33, 35, 36, 38, 37],
'AP-2': [30, 32, 34, 80, 33, 35, 36, 38, 37],
'AP-3': [30, 32, 81, 31, 33, 101, 36, 38, 37],
'AP-4': [30, 32, 34, 95, 33, 35, 103, 38, 121],
'AP-5': [30, 32, 34, 31, 33, 144, 36, 38, 37],
'AP-6': [30, 32, 34, 31, 33, 35, 36, 110, 37],
'AP-7': [30, 87, 34, 31, 111, 35, 36, 38, 122],
'AP-8': [30, 32, 99, 31, 33, 35, 36, 38, 37],
'AP-9': [30, 32, 34, 31, 33, 99, 88, 38, 37]},
index=['1', '2', '3', '4', '5', '6', '7', '8', '9'])

# This is the actual frame you want
df = df.transpose()

m1 = (df >= 80).any(1)
m2 = ~(df >= 100).any(1) #<-- Invert the statement with ~

df2 = df.loc[m1&m2]
print(df2)

Prints:

      1   2   3   4   5   6   7   8   9
AP-2 30 32 34 80 33 35 36 38 37
AP-8 30 32 99 31 33 35 36 38 37
AP-9 30 32 34 31 33 99 88 38 37

filter rows when all columns greater than a value

We can create a logical matrix my comparing the entire data frame with 2 and then do rowSums over it and select only those rows whose value is equal to number of columns in df

df[rowSums(df > 2) == ncol(df), ]

# A B C
#2 4 3 5

A dplyr approach using filter_all and all_vars

library(dplyr) 
df %>% filter_all(all_vars(. > 2))

# A B C
#1 4 3 5

dplyr > 1.0.0

#1. if_all
df %>% filter(if_all(.fns = ~. > 2))

#2. across
df %>% filter(across(.fns = ~. > 2))

An apply approach

#Using apply
df[apply(df > 2, 1, all), ]
#Using lapply as shared by @thelatemail
df[Reduce(`&`, lapply(df, `>`, 2)),]

Subset dataframe on presence of certain value in any column

additional solution:

df <- data.frame(
v1 = c(1:10),
v2 = c(10:1),
v3 = c(1,3,2,9,5,6,1,2,3,9)
)

df[apply(df, 1, function(x) any(x == 9)), ]
#> v1 v2 v3
#> 2 2 9 3
#> 4 4 7 9
#> 9 9 2 3
#> 10 10 1 9

Created on 2021-02-25 by the reprex package (v1.0.0)

using tidyverse

library(tidyverse)
df %>%
rowwise() %>%
filter(any(c_across(everything()) == 9))
#> # A tibble: 4 x 3
#> # Rowwise:
#> v1 v2 v3
#> <int> <int> <dbl>
#> 1 2 9 3
#> 2 4 7 9
#> 3 9 2 3
#> 4 10 1 9

Created on 2021-02-25 by the reprex package (v1.0.0)

R keep rows with at least one column greater than value

You can use rowSums to construct the condition in base R:

df[rowSums(df > 10) >= 1, ]

with dplyr (0.7.0), now you can use filter_all like this:

library(dplyr)
filter_all(df, any_vars(. > 10))

Extracting columns having greater than certain values in R dataframe

We could use colSums to subset columns using base R

df[colSums(df > 0.6) > 0]

# Jux Gyno
#1 0.67 0.89
#2 0.11 0.65
#3 0.60 0.67
#4 0.09 0.01

Or with dplyr, select_if

library(dplyr)
df %>% select_if(~any(. > 0.6))

How to select column values based on a greater than condition in row values

We can create a logical vector by comparing the dataframe with 3 and then take sum of columns using colSums and select only those columns which has at least one value greater than 3 in it.

mtcars[colSums(mtcars > 3) > 0]

# mpg cyl disp hp drat wt qsec gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 3 1
#....

Variation using sapply

mtcars[sapply(mtcars, function(x) any(x > 3))]


Related Topics



Leave a reply



Submit