How to delete rows where all the columns are zero
You can use (1)
dat[as.logical(rowSums(dat != 0)), ]
This works for both positive and negative values.
Another, even faster, possibility for large datasets is (2)
dat[rowSums(!as.matrix(dat)) < ncol(dat), ]
A faster approach for short and long data frames is to use matrix multiplication (3):
dat[as.logical(abs(as.matrix(dat)) %*% rep(1L, ncol(dat))), ]
Some benchmarks:
# the original dataset
dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))
Codoremifa <- function() dat[rowSums(abs(dat)) != 0,]
Marco <- function() dat[!apply(dat, 1, function(x) all(x == 0)), ]
Sven <- function() dat[as.logical(rowSums(dat != 0)), ]
Sven_2 <- function() dat[rowSums(!as.matrix(dat)) < ncol(dat), ]
Sven_3 <- function() dat[as.logical(abs(as.matrix(dat)) %*% rep(1L,ncol(dat))), ]
library(microbenchmark)
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: microseconds
# expr min lq median uq max neval
# Codoremifa() 267.772 273.2145 277.1015 284.0995 1190.197 100
# Marco() 192.509 198.4190 201.2175 208.9925 265.594 100
# Sven() 143.372 147.7260 150.0585 153.9455 227.031 100
# Sven_2() 152.080 155.1900 156.9000 161.5650 214.591 100
# Sven_3() 146.793 151.1460 153.3235 157.9885 187.845 100
# a data frame with 10.000 rows
set.seed(1)
dat <- dat[sample(nrow(dat), 10000, TRUE), ]
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: milliseconds
# expr min lq median uq max neval
# Codoremifa() 2.426419 2.471204 3.488017 3.750189 84.268432 100
# Marco() 36.268766 37.840246 39.406751 40.791321 119.233175 100
# Sven() 2.145587 2.184150 2.205299 2.270764 83.055534 100
# Sven_2() 2.007814 2.048711 2.077167 2.207942 84.944856 100
# Sven_3() 1.814994 1.844229 1.861022 1.917779 4.452892 100
How to remove rows where all columns are zero using data.table
You can try with rowSums
-
library(data.table)
setDT(dat)
dat[rowSums(dat != 0) != 0]
# a b c
#1: 0 1 0
#2: 2 0 1
#3: 3 0 3
How to remove rows from a DataFrame where some columns only have zero values
try this,
df[~df[list('cdef')].eq(0).all(axis = 1)]
a b c d e f
0 1 2 3 4 5 6
1 11 22 33 44 55 66
Delete rows with all the zeros elements in all columns exceptionally leaving a single non zero column in pandas DF
Use iloc
for select all columns without first, comapre for not equal by ne
and test at least one True
per rows by any
for filter by boolean indexing
, last reset_index
:
df = df[df.iloc[:, 1:].ne(0).any(axis=1)].reset_index(drop=True)
Alternative with remove column Time
:
df = df[df.drop('Time', axis=1).ne(0).any(axis=1)].reset_index(drop=True)
print (df)
Time a b c d e
0 2 1 2 0 0 0
1 4 5 0 0 0 0
2 6 7 0 0 0 0
How to remove rows where all columns are zero using dplyr pipe
Here's a dplyr option:
library(dplyr)
filter_all(dat, any_vars(. != 0))
# A-XXX fBM-XXX P-XXX vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000
Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.
Regarding row.names:
library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
# rowname A-XXX fBM-XXX P-XXX vBM-XXX
#1 BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
Drop rows with all zeros in pandas data frame
It turns out this can be nicely expressed in a vectorized fashion:
> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a b
1 0 1
2 1 0
3 1 1
How to delete rows where only specific columns are zero in a list in R?
Assume your dataframe is DF
DF <- DF[DF$RC + DF$medRC + DF$RC.norm + DF$medRC.norm != 0,]
Edit
For list of dataframe
# dummy data
listDF <- list(DF1 = data.frame(RC=c(0,1,2,3),
medRC=c(0,3,4,0),
RC.norm=c(0,3,3,3),
medRC.norm=c(0,3,3,3)),
DF2 = data.frame(RC=c(2,0,1,2,3),
medRC=c(4,0,3,4,0),
RC.norm=c(1,0,3,3,3),
medRC.norm=c(0,0,3,3,3)))
filteredDF <- lapply(listDF,
function(DF) DF[DF$RC + DF$medRC + DF$RC.norm + DF$medRC.norm != 0,])
filteredDF
# $DF1
# RC medRC RC.norm medRC.norm
# 2 1 3 3 3
# 3 2 4 3 3
# 4 3 0 3 3
# $DF2
# RC medRC RC.norm medRC.norm
# 1 2 4 1 0
# 3 1 3 3 3
# 4 2 4 3 3
# 5 3 0 3 3
drop all rows when all cells are either zero or nan
Your solution should be changed by select all columns without first with DataFrame.all
:
df1 = df.iloc[:, 1:]
df2 = df[(df1 != 0).all(axis=1) | (df1 != 'nan').all(axis=1)]
Working like DataFrame.isin
:
df2 = df[~df.iloc[:, 1:].isin([0, 'nan']).all(axis=1)]
If NaN
are missing values is necessary test them by DataFrame.notna
:
df1 = df.iloc[:, 1:]
df2 = df[(df1 != 0).all(axis=1) | df1.notna().all(axis=1)]
Or:
df2 = df[~df.iloc[:, 1:].fillna('nan').isin([0, 'nan']).all(axis=1)]
Related Topics
To Find Most Frequently Occuring Element in Matrix in R
Create and Assign Multiple New Dataframe Columns in Ifelse Statement
Replacing Na Values from Another Dataframe by Id
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group
Why Is '[' Better Than 'Subset'
Group by Multiple Columns and Sum Other Multiple Columns
Dictionary Style Replace Multiple Items
How to Use "≪≪-" (Scoping Assignment) in R
How to Delete a Row by Reference in Data.Table
Looping Over a Date or Posixct Object Results in a Numeric Iterator
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)
How to Select Variables in an R Dataframe Whose Names Contain a Particular String
How Does the 'Prop.Table()' Function Work in R
How to Convert a Factor to Integer\Numeric Without Loss of Information
Showing Data Values on Stacked Bar Chart in Ggplot2
Select/Assign to Data.Table When Variable Names Are Stored in a Character Vector