How to Remove Rows with All Zeros Without Using Rowsums in R

How to remove rows with all zeros in R without getting Error in rowSums 'x' must be numeric

It looks like you want examine all columns but the first three.

df1[, -3] is the data frame with the third column removed. You want to remove columns 1, 2 and 3, which is represented by 1:3 in R, giving this expression:

df2 <- df1[rowSums(df1[, -(1:3)]) > 0, ]

R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods?

A dplyr method

You can apply rowSums to numeric columns only, using dplyrs filter() and across(), with the helper where(is.numeric):

library(dplyr)

df%>%filter(rowSums(across(where(is.numeric)))!=0)

person id a b c d e
1 Ed plot1 2 0 4 0 7
2 Ed plot3 0 6 0 0 5
3 Sue plot4 0 4 0 3 0
4 Sue plot6 1 8 0 1 1
5 Ed plot7 0 1 0 0 0
6 Ed plot9 4 0 0 9 0
7 Ed plot11 0 1 3 1 7
8 Sue plot12 0 1 8 5 0

This method (and some of those that depend on rowSums()) can fail if your numeric columns have negative values as well.
In that case we must make sure that we keep only the rows that contain at least any()non-zero values. This can be done by modifying the rowSums() to include the condition .x!=0inside across():

df%>%filter(rowSums(across(where(is.numeric), ~.x!=0))>0)

Or with logical operators and Reduce()/reduce(), with the following code:

library(dplyr)
library(purrr)

df%>%filter(pmap_lgl(select(., where(is.numeric)), ~any(c(...)!=0)))

#or with purrr:reduce()#

df%>%filter(across(where(is.numeric), ~.x!=0)%>%reduce(`|`))
#or simply
df%>%filter(reduce(across(where(is.numeric), ~.x!=0), `|`))

a base R method

You can use base subsetting with [, with sapply(f, is.numeric) to create a logical index to select only numerical columns to feed to the inequality operator !=, then take the rowSums() of the final logical matrix that is created and select only rows in which the rowSums is >0:

df[rowSums(df[,sapply(df, is.numeric)]!=0)>0,]

EDIT

We can benefit from the coercion that comes from calling logical functions on numeric vectors. as.logical() will evaluate zeroes to FALSE and any non-zero numbers to TRUE. x|x and nested bang signs !(!) will do the same. This is consistent with the other solutions that compare elements to ZERO, and is therefore more consistent than the rowSumssolution.

An example:

vector<-c(0,1,2,-1)
identical(as.logical(vector), vector|vector, vector!=0, !(!vector))

[1] TRUE

There are some neat ways to solve this with that in mind:

df%>%filter(reduce(across(where(is.numeric), as.logical), `|`))
#or simply
df%>%filter(reduce(across(where(is.numeric)), `|`))
#and with base R:
df[Reduce(`|`, df[sapply(df, is.numeric)]),]

And the cleanest so far, with the new if_any():

df%>%filter(if_any(where(is.numeric)))

How to delete rows where all the columns are zero

You can use (1)

dat[as.logical(rowSums(dat != 0)), ]

This works for both positive and negative values.

Another, even faster, possibility for large datasets is (2)

dat[rowSums(!as.matrix(dat)) < ncol(dat), ]

A faster approach for short and long data frames is to use matrix multiplication (3):

dat[as.logical(abs(as.matrix(dat)) %*% rep(1L, ncol(dat))), ]

Some benchmarks:

# the original dataset
dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))

Codoremifa <- function() dat[rowSums(abs(dat)) != 0,]
Marco <- function() dat[!apply(dat, 1, function(x) all(x == 0)), ]
Sven <- function() dat[as.logical(rowSums(dat != 0)), ]
Sven_2 <- function() dat[rowSums(!as.matrix(dat)) < ncol(dat), ]
Sven_3 <- function() dat[as.logical(abs(as.matrix(dat)) %*% rep(1L,ncol(dat))), ]

library(microbenchmark)
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: microseconds
# expr min lq median uq max neval
# Codoremifa() 267.772 273.2145 277.1015 284.0995 1190.197 100
# Marco() 192.509 198.4190 201.2175 208.9925 265.594 100
# Sven() 143.372 147.7260 150.0585 153.9455 227.031 100
# Sven_2() 152.080 155.1900 156.9000 161.5650 214.591 100
# Sven_3() 146.793 151.1460 153.3235 157.9885 187.845 100

# a data frame with 10.000 rows
set.seed(1)
dat <- dat[sample(nrow(dat), 10000, TRUE), ]
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: milliseconds
# expr min lq median uq max neval
# Codoremifa() 2.426419 2.471204 3.488017 3.750189 84.268432 100
# Marco() 36.268766 37.840246 39.406751 40.791321 119.233175 100
# Sven() 2.145587 2.184150 2.205299 2.270764 83.055534 100
# Sven_2() 2.007814 2.048711 2.077167 2.207942 84.944856 100
# Sven_3() 1.814994 1.844229 1.861022 1.917779 4.452892 100

How to remove rows where all columns are zero using data.table

You can try with rowSums -

library(data.table)
setDT(dat)
dat[rowSums(dat != 0) != 0]

# a b c
#1: 0 1 0
#2: 2 0 1
#3: 3 0 3

How to remove rows with any zero value

There are a few different ways of doing this. I prefer using apply, since it's easily extendable:

##Generate some data
dd = data.frame(a = 1:4, b= 1:0, c=0:3)

##Go through each row and determine if a value is zero
row_sub = apply(dd, 1, function(row) all(row !=0 ))
##Subset as usual
dd[row_sub,]

How to remove rows with 0 values using R

df[apply(df[,-1], 1, function(x) !all(x==0)),]

Removing rows having only zeros

You can use filter_if :

library(dplyr)
df %>% filter_if(is.numeric, any_vars(. != 0 & !is.na(.)))

# x y z
#1 a 1 2
#2 b 0 3
#3 c 1 NA

Or using base R :

cols <- sapply(df, is.numeric)
df[rowSums(!is.na(df[cols]) & df[cols] != 0) > 0, ]


Related Topics



Leave a reply



Submit