Remove columns with NA's and/or Zeros Only
One option would be to create a logical vector with colSums
based on the number of NA
or 0 elements in each column
d[!colSums(is.na(d)|d ==0) == nrow(d)]
# a c
#1 1 98
#2 5 67
#3 56 NA
#4 4 3
#5 9 7
Or another option is to replace
all the 0s to NA
and then apply is.na
d[colSums(!is.na(replace(d, d == 0, NA))) > 0]
Or more compactly with na_if
d[colSums(!is.na(na_if(d, 0))) > 0]
Excluding columns from a dataframe based on column sums
What about a simple subset? First, we create a simple data frameL
R> dd = data.frame(x = runif(5), y = 20*runif(5), z=20*runif(5))
Then select the columns where the sum is greater than 15
R> dd1 = dd[,colSums(dd) > 15]
R> ncol(dd1)
[1] 2
In your data set, you only want to subset columns 6 onwards, so something like:
##Drop the first five columns
dd[,colSums(dd[,6:ncol(dd)]) > 15]
or
#Keep the first six columns
cols_to_drop = c(rep(TRUE, 5), dd[,6:ncol(dd)]>15)
dd[,cols_to_drop]
should work.
The key part to note is that in the square brackets, we want a vector of logicals, i.e. a vector of TRUE and FALSE. So if you wanted to subset using something a bit more complicated, then create a function that returns TRUE or FALSE and subset as usual.
Remove all columns or rows with only zeros out of a data frame
Using colSums()
:
df[, colSums(abs(df)) > 0]
i.e. a column has only zeros if and only if the sum of the absolute values is zero.
How to delete R data.frame columns with only zero values?
One option using dplyr
could be:
df %>%
select(where(~ any(. != 0)))
1 0 2 2
2 2 3 5
3 5 0 1
4 7 0 2
5 2 1 3
6 3 0 4
7 0 4 5
8 3 0 6
Remove columns with zero values from a dataframe
You almost have it. Put those two together:
SelectVar[, colSums(SelectVar != 0) > 0]
This works because the factor columns are evaluated as numerics that are >= 1.
How to remove columns and rows that sum to 0 while preserving non-numeric columns
try this:
# remove rows
df <- df[rowSums(df[-(1:7)]) !=0, ]
# remove columns
df <- df[c(1:7,7 + which(colSums(df[-(1:7)]) !=0))]
# Site Date Mon Day Yr Szn SznYr B C D E F G
# 2 B0001 7/29/97 7 29 1997 Summer 1997-Summer 0 1 0 0 0 0
# 3 B0001 7/29/97 7 29 1997 Summer 1997-Summer 0 0 3 0 0 0
# 4 B0001 7/29/97 7 29 1997 Summer 1997-Summer 0 0 0 0 0 10
# 5 B0002 7/28/97 7 28 1997 Summer 1997-Summer 0 0 0 5 0 0
# 7 B0002 7/28/97 7 28 1997 Summer 1997-Summer 0 0 0 0 6 0
# 10 B0002 7/28/97 7 28 1997 Summer 1997-Summer 0 0 0 0 0 8
# 11 B0002 6/28/07 6 28 2007 Summer 2007-Summer 3 6 1 7 0 1
You can do this in one step to get the same output as @dan-y (the same in this specific case, but different if you have negative values in your real data) :
df <- df[rowSums(df[-(1:7)]) !=0,
c(1:7,7 + which(colSums(df[-(1:7)]) !=0))]
Related Topics
Code Folding for Individual Chunks in R Markdown
Extract Names of Dataframes Passed with Dots
Why Does Withcallinghandlers Still Stops Execution
How to Format the X-Axis of the Hard Coded Plotting Function of Spei Package in R
How to Draw Roc Curve Using Value of Confusion Matrix
R - Converting Posixct to Milliseconds
Error in Running Factor() on a Column of a Data Frame
Converting 1M to 1000000 Elegantly
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them
Change Standard Error Color for Geom_Smooth
Classic Case of 'Sum' Returning Na Because It Doesn't Sum Nas
Reshape R Data with User Entries in Rows, Collapsing for Each User
Sum Columns Row-Wise with Similar Names
How to Store Filter Expressions as Strings
Out of Order Text Labels on Stack Bar Plot (Ggplot)
How to Create a Dropdown List in a Shiny Table Using Datatable When Editing the Table