How to Delete Rows Where All the Columns Are Zero

How to delete rows where all the columns are zero

You can use (1)

dat[as.logical(rowSums(dat != 0)), ]

This works for both positive and negative values.

Another, even faster, possibility for large datasets is (2)

dat[rowSums(!as.matrix(dat)) < ncol(dat), ]

A faster approach for short and long data frames is to use matrix multiplication (3):

dat[as.logical(abs(as.matrix(dat)) %*% rep(1L, ncol(dat))), ]

Some benchmarks:

# the original dataset
dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))

Codoremifa <- function() dat[rowSums(abs(dat)) != 0,]
Marco <- function() dat[!apply(dat, 1, function(x) all(x == 0)), ]
Sven <- function() dat[as.logical(rowSums(dat != 0)), ]
Sven_2 <- function() dat[rowSums(!as.matrix(dat)) < ncol(dat), ]
Sven_3 <- function() dat[as.logical(abs(as.matrix(dat)) %*% rep(1L,ncol(dat))), ]

library(microbenchmark)
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: microseconds
# expr min lq median uq max neval
# Codoremifa() 267.772 273.2145 277.1015 284.0995 1190.197 100
# Marco() 192.509 198.4190 201.2175 208.9925 265.594 100
# Sven() 143.372 147.7260 150.0585 153.9455 227.031 100
# Sven_2() 152.080 155.1900 156.9000 161.5650 214.591 100
# Sven_3() 146.793 151.1460 153.3235 157.9885 187.845 100


# a data frame with 10.000 rows
set.seed(1)
dat <- dat[sample(nrow(dat), 10000, TRUE), ]
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: milliseconds
# expr min lq median uq max neval
# Codoremifa() 2.426419 2.471204 3.488017 3.750189 84.268432 100
# Marco() 36.268766 37.840246 39.406751 40.791321 119.233175 100
# Sven() 2.145587 2.184150 2.205299 2.270764 83.055534 100
# Sven_2() 2.007814 2.048711 2.077167 2.207942 84.944856 100
# Sven_3() 1.814994 1.844229 1.861022 1.917779 4.452892 100

How to remove rows where all columns are zero using data.table

You can try with rowSums -

library(data.table)
setDT(dat)
dat[rowSums(dat != 0) != 0]

# a b c
#1: 0 1 0
#2: 2 0 1
#3: 3 0 3

How to remove rows from a DataFrame where some columns only have zero values

try this,

df[~df[list('cdef')].eq(0).all(axis = 1)]


    a   b   c   d   e   f
0 1 2 3 4 5 6
1 11 22 33 44 55 66

Delete rows with all the zeros elements in all columns exceptionally leaving a single non zero column in pandas DF

Use iloc for select all columns without first, comapre for not equal by ne and test at least one True per rows by any for filter by boolean indexing, last reset_index:

df = df[df.iloc[:, 1:].ne(0).any(axis=1)].reset_index(drop=True)

Alternative with remove column Time:

df = df[df.drop('Time', axis=1).ne(0).any(axis=1)].reset_index(drop=True)

print (df)
Time a b c d e
0 2 1 2 0 0 0
1 4 5 0 0 0 0
2 6 7 0 0 0 0

How to remove rows where all columns are zero using dplyr pipe

Here's a dplyr option:

library(dplyr)
filter_all(dat, any_vars(. != 0))

# A-XXX fBM-XXX P-XXX vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000

Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.

Regarding row.names:

library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
# rowname A-XXX fBM-XXX P-XXX vBM-XXX
#1 BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000

Drop rows with all zeros in pandas data frame

It turns out this can be nicely expressed in a vectorized fashion:

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a b
1 0 1
2 1 0
3 1 1

How to delete rows where only specific columns are zero in a list in R?

Assume your dataframe is DF

DF <- DF[DF$RC + DF$medRC + DF$RC.norm + DF$medRC.norm != 0,]

Edit

For list of dataframe

# dummy data
listDF <- list(DF1 = data.frame(RC=c(0,1,2,3),
medRC=c(0,3,4,0),
RC.norm=c(0,3,3,3),
medRC.norm=c(0,3,3,3)),
DF2 = data.frame(RC=c(2,0,1,2,3),
medRC=c(4,0,3,4,0),
RC.norm=c(1,0,3,3,3),
medRC.norm=c(0,0,3,3,3)))


filteredDF <- lapply(listDF,
function(DF) DF[DF$RC + DF$medRC + DF$RC.norm + DF$medRC.norm != 0,])

filteredDF

# $DF1
# RC medRC RC.norm medRC.norm
# 2 1 3 3 3
# 3 2 4 3 3
# 4 3 0 3 3

# $DF2
# RC medRC RC.norm medRC.norm
# 1 2 4 1 0
# 3 1 3 3 3
# 4 2 4 3 3
# 5 3 0 3 3

drop all rows when all cells are either zero or nan

Your solution should be changed by select all columns without first with DataFrame.all:

df1 = df.iloc[:, 1:]
df2 = df[(df1 != 0).all(axis=1) | (df1 != 'nan').all(axis=1)]

Working like DataFrame.isin:

df2 = df[~df.iloc[:, 1:].isin([0, 'nan']).all(axis=1)]

If NaN are missing values is necessary test them by DataFrame.notna:

df1 = df.iloc[:, 1:]
df2 = df[(df1 != 0).all(axis=1) | df1.notna().all(axis=1)]

Or:

df2 = df[~df.iloc[:, 1:].fillna('nan').isin([0, 'nan']).all(axis=1)]


Related Topics



Leave a reply



Submit