Subset dataframe such that all values in each row are less than a certain value
You can use the negated rowSums()
for the subset
df[!rowSums(df[-1] > 0.7), ]
# zips ABC DEF GHI JKL
# 4 4 0.6 0.4 0.2 0.3
# 6 6 0.2 0.7 0.3 0.4
df[-1] > 0.7
gives us a logical matrix telling us whichdf[-1]
are greater than 0.7rowSums()
sums across those rows (each TRUE value is equal to 1, FALSE is zero)!
converts those values to logical and negates them, so that we get any row sums which are zero (FALSE) and turn them into TRUE. In other words, if therowSums()
result is zero, we want those rows.- we use that logical vector for the row subset
Another way to get the same logical vector would be to do
rowSums(df[-1] > 0.7) == 0
How to subset rows of a dataframe that have values lower than a negative number in one column?
You can also use between
and reverse the result:
df_filtered = df[~df['third_column'].between(-0.3, 0.3)]
Example:
>>> df
third_column
0 -0.190030
1 -0.205187
2 -0.066776
3 -0.264480
4 0.064962
5 0.024708
6 -0.354629 # Want to keep
7 -0.180228
8 0.261640
9 0.315986 # Want to keep
>>> df[~df['third_column'].between(-0.3, 0.3)]
third_column
6 -0.354629
9 0.315986
subset all columns in a data frame less than a certain value in R
An attempt:
sapply(df,function(x) table(cut(x[x<0.009],c(0,0.000001,0.001,0.002,Inf))) )
# o m l c a aa ep
#(0,1e-06] 2 0 0 5 5 0 0
#(1e-06,0.001] 3 4 5 0 0 5 4
#(0.001,0.002] 0 0 0 0 0 0 1
#(0.002,Inf] 0 1 0 0 0 0 0
Subset Pandas Data Frame Based on Some Row Value
Here is the answer to my question:
lower_threshold = 3.0
start_column = 5
df = df.loc[start_column:, (df >= lower_threshold).any(axis=0)]
how to filter rows in dataframe for specific groups having values less than 1st quartile for 2 columns in R?
With data.table
you can do something like this:
require(data.table)
setDT(df)
df_sub <- df[, c("QSD_1", "QSD_2") := lapply(.SD, quantile, probs = .25),
by = group, .SDcols = c("SD_1", "SD_2")][SD_1 <= QSD_1 & SD_2 <= QSD_2]
Select rows in a dataframe based on values of all columns
We can try with Reduce
and &
df[Reduce(`&`, lapply(replace(df[-1], is.na(df[-1]), 0), `<`, 200)),]
# ID col1 col2
#1 1 NA 24
#2 2 20 NA
data
set.seed(24)
df <- data.frame(ID=1:4, col1 = c(NA, 20, 210, 30), col2 = c(24, NA, 30, 240))
Subsetting Rows with a Column Value Greater than a Threshold
We can use rowSums
data[rowSums(data[5:70] > 7) > 0, ]
Or with subset
subset(data, rowSums(data[5:70] > 7) > 0)
We can also use filter_at
from dplyr
with any_vars
library(dplyr)
data %>% filter_at(vars(5:70), any_vars(. > 7))
Using reproducible data from mtcars
(stealing idea from @Maurits Evers)
mtcars[rowSums(mtcars[3:11] > 300) > 0, ]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
#Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
#AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
#Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
#Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
#Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Using filter_at
also gives the same output
mtcars %>% filter_at(vars(3:11), any_vars(. > 300))
Pandas data frame select all rows less than a column content float values
You could try this here:
new_df = df_thd_funct_mode1_T[df_thd_funct_mode1_T.apply(lambda row: all(float(column) <= .3768 for column in row), axis=1)]
Here you do not need the loop any more that you had in your example.
I basically go through the dataframe there and for each row I check, whether all
values are less than .3768.
If you want to filter such, that you accept the row as soon as there is any value less than .3768 in that row, you have to replace all
with any
.
This of course will only work under the condition, that all columns only contain floats. If not, then you will run into an Error, trying to cast that into a float.
Filter rows of pandas dataframe whose values are lower than 0
If you want to apply it to all columns, do df[df > 0]
with dropna()
:
>>> df[df > 0].dropna()
a b
0 21 1
3 3 17
If you know what columns to apply it to, then do for only those cols with df[df[cols] > 0]
:
>>> cols = ['b']
>>> df[cols] = df[df[cols] > 0][cols]
>>> df.dropna()
a b
0 21 1
2 -4 14
3 3 17
Related Topics
Aggregate by Specific Year in R
Move a Column to First Position in a Data Frame
Return Row Number(S) for a Particular Value in a Column in a Dataframe
Read Lines by Number from a Large File
Building a List in a Loop in R - Getting Item Names Correct
Copying and Modifying a Default Theme
Creating a Sequential List of Letters with R
R Draw Kmeans Clustering with Heatmap
How to Combine Aes() and Aes_String() Options
Lookup Values Corresponding to the Closest Date
Delete Rows with Blank Values in One Particular Column
How to Remove "Rows" with a Na Value
What Is a Fast Way to Set Debugging Code at a Given Line in a Function
What Does ..Level.. Mean in Ggplot::Stat_Density2D
Use of Switch() in R to Replace Vector Values