Subset Dataframe Such That All Values in Each Row Are Less Than a Certain Value

Subset dataframe such that all values in each row are less than a certain value

You can use the negated rowSums() for the subset

df[!rowSums(df[-1] > 0.7), ]
# zips ABC DEF GHI JKL
# 4 4 0.6 0.4 0.2 0.3
# 6 6 0.2 0.7 0.3 0.4
  • df[-1] > 0.7 gives us a logical matrix telling us which df[-1] are greater than 0.7
  • rowSums() sums across those rows (each TRUE value is equal to 1, FALSE is zero)
  • ! converts those values to logical and negates them, so that we get any row sums which are zero (FALSE) and turn them into TRUE. In other words, if the rowSums() result is zero, we want those rows.
  • we use that logical vector for the row subset

Another way to get the same logical vector would be to do

rowSums(df[-1] > 0.7) == 0

How to subset rows of a dataframe that have values lower than a negative number in one column?

You can also use between and reverse the result:

df_filtered = df[~df['third_column'].between(-0.3, 0.3)]

Example:

>>> df
third_column
0 -0.190030
1 -0.205187
2 -0.066776
3 -0.264480
4 0.064962
5 0.024708
6 -0.354629 # Want to keep
7 -0.180228
8 0.261640
9 0.315986 # Want to keep

>>> df[~df['third_column'].between(-0.3, 0.3)]
third_column
6 -0.354629
9 0.315986

subset all columns in a data frame less than a certain value in R

An attempt:

sapply(df,function(x) table(cut(x[x<0.009],c(0,0.000001,0.001,0.002,Inf))) )

# o m l c a aa ep
#(0,1e-06] 2 0 0 5 5 0 0
#(1e-06,0.001] 3 4 5 0 0 5 4
#(0.001,0.002] 0 0 0 0 0 0 1
#(0.002,Inf] 0 1 0 0 0 0 0

Subset Pandas Data Frame Based on Some Row Value

Here is the answer to my question:

lower_threshold = 3.0
start_column = 5
df = df.loc[start_column:, (df >= lower_threshold).any(axis=0)]

how to filter rows in dataframe for specific groups having values less than 1st quartile for 2 columns in R?

With data.table you can do something like this:

require(data.table)
setDT(df)
df_sub <- df[, c("QSD_1", "QSD_2") := lapply(.SD, quantile, probs = .25),
by = group, .SDcols = c("SD_1", "SD_2")][SD_1 <= QSD_1 & SD_2 <= QSD_2]

Select rows in a dataframe based on values of all columns

We can try with Reduce and &

df[Reduce(`&`, lapply(replace(df[-1], is.na(df[-1]), 0), `<`, 200)),]
# ID col1 col2
#1 1 NA 24
#2 2 20 NA

data

set.seed(24)
df <- data.frame(ID=1:4, col1 = c(NA, 20, 210, 30), col2 = c(24, NA, 30, 240))

Subsetting Rows with a Column Value Greater than a Threshold

We can use rowSums

data[rowSums(data[5:70] > 7) > 0, ]

Or with subset

subset(data, rowSums(data[5:70] > 7) > 0)

We can also use filter_at from dplyr with any_vars

library(dplyr)
data %>% filter_at(vars(5:70), any_vars(. > 7))

Using reproducible data from mtcars (stealing idea from @Maurits Evers)

mtcars[rowSums(mtcars[3:11] > 300) > 0, ]

# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
#Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
#AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
#Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
#Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
#Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8

Using filter_at also gives the same output

mtcars %>% filter_at(vars(3:11), any_vars(. > 300))

Pandas data frame select all rows less than a column content float values

You could try this here:

new_df = df_thd_funct_mode1_T[df_thd_funct_mode1_T.apply(lambda row: all(float(column) <= .3768 for column in row), axis=1)]

Here you do not need the loop any more that you had in your example.

I basically go through the dataframe there and for each row I check, whether all values are less than .3768.

If you want to filter such, that you accept the row as soon as there is any value less than .3768 in that row, you have to replace all with any.

This of course will only work under the condition, that all columns only contain floats. If not, then you will run into an Error, trying to cast that into a float.

Filter rows of pandas dataframe whose values are lower than 0

If you want to apply it to all columns, do df[df > 0] with dropna():

>>> df[df > 0].dropna()
a b
0 21 1
3 3 17

If you know what columns to apply it to, then do for only those cols with df[df[cols] > 0]:

>>> cols = ['b']
>>> df[cols] = df[df[cols] > 0][cols]
>>> df.dropna()
a b
0 21 1
2 -4 14
3 3 17


Related Topics



Leave a reply



Submit