R: Remove multiple empty columns of character variables
If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.
Sample data:
mydf <- data.frame(
A = c("a", "b"),
B = c("y", ""),
C = c("", ""),
D = c("", ""),
E = c("", "z")
)
mydf
# A B C D E
# 1 a y
# 2 b z
Identifying and removing the "empty" columns.
mydf[!sapply(mydf, function(x) all(x == ""))]
# A B E
# 1 a y
# 2 b z
Alternatively, as recommended by @Roland:
> mydf[, colSums(mydf != "") != 0]
A B E
1 a y
2 b z
Remove Multiple Empty Columns for String
One option using base R apply
is to first calculate number of columns which are going to be present in the final dataframe (cols
). Filter empty values from each row and insert empty values using rep
.
cols <- max(rowSums(df != ""))
as.data.frame(t(apply(df, 1, function(x) {
vals <- x[x != ""]
c(vals, rep("", cols - length(vals)))
})))
# V1 V2 V3
#1 aaa ccc
#2 aaa bbb
#3 bbb ccc ddd
Another option with gather
/spread
would be to add a new column for row number convert it to long format using gather
, filter
the non-empty values, group_by
every row
and give new column names using paste0
and finally convert it to wide format using spread
.
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
gather(key, value, -row) %>%
filter(value != "") %>%
group_by(row) %>%
mutate(key = paste0("new", row_number())) %>%
spread(key, value, fill = "") %>%
ungroup() %>%
select(-row)
# new1 new2 new3
# <chr> <chr> <chr>
#1 aaa ccc ""
#2 aaa bbb ""
#3 bbb ccc ddd
Remove columns from dataframe where ALL values are NA, NULL or empty
We can use Filter
Filter(function(x) !(all(x=="")), df)
# Var1 Var3
#1 2R+ 52
#2 2R+ 169
#3 2R+ 83
#4 2R+ 98
#5 2R+ NA
#6 2R+ 111
#7 2R+ 94
#8 2R+ 116
#9 2R+ 86
NOTE: It should also work if all the elements are NA for a particular column
df$Var3 <- NA
Filter(function(x) !(all(x=="")), df)
# Var1
#1 2R+
#2 2R+
#3 2R+
#4 2R+
#5 2R+
#6 2R+
#7 2R+
#8 2R+
#9 2R+
Update
Based on the updated dataset, if we need to remove the columns with only 0 values, then change the code to
Filter(function(x) !(all(x==""|x==0)), df2)
# VAR1 VAR3 VAR4 VAR7
#1 2R+ 52 1.05 30
#2 2R+ 169 1.02 40
#3 2R+ 83 NA 40
#4 2R+ 98 1.16 40
#5 2R+ 154 1.11 40
#6 2R+ 111 NA 15
data
df2 <- structure(list(VAR1 = c("2R+", "2R+", "2R+", "2R+", "2R+", "2R+"
), VAR2 = c("", "", "", "", "", ""), VAR3 = c(52L, 169L, 83L,
98L, 154L, 111L), VAR4 = c(1.05, 1.02, NA, 1.16, 1.11, NA), VAR5 = c(0L,
0L, 0L, 0L, 0L, 0L), VAR6 = c(0L, 0L, 0L, 0L, 0L, 0L), VAR7 = c(30L,
40L, 40L, 40L, 40L, 15L)), .Names = c("VAR1", "VAR2", "VAR3",
"VAR4", "VAR5", "VAR6", "VAR7"), row.names = c("1", "2", "3",
"4", "5", "6"), class = "data.frame")
Piping the removal of empty columns using dplyr
We can use a version of select_if
library(dplyr)
df %>%
select_if(function(x) !(all(is.na(x)) | all(x=="")))
# id Q2 Q3 Q4
#1 1 1 NA
#2 2 2
#3 3 4 3 2
#4 4 5 4 2
Or without using an anonymous function call
df %>% select_if(~!(all(is.na(.)) | all(. == "")))
You can also modify your apply
statement as
df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]
Or using colSums
df[colSums(is.na(df) | df == "") != nrow(df)]
and inverse
df[colSums(!(is.na(df) | df == "")) > 0]
Removing all empty columns and rows in data.frame when rows don't go away
You have NA
and also empty rows. You can do
B1[rowSums(is.na(B1) | B1 == "") != ncol(B1), ]
# study.name group.name outcome ESL prof scope type
#1 Shin.Ellis ME.short 1 1 2 1 1
#2 Shin.Ellis ME.long 1 1 2 1 1
#3 Shin.Ellis DCF.short 1 1 2 1 2
#4 Shin.Ellis DCF.long 1 1 2 1 2
#5 Shin.Ellis Cont.short 1 1 2 NA NA
#6 Shin.Ellis Cont.long 1 1 2 NA NA
#8 Trus.Hsu Exper 1 2 2 2 1
#.....
We can also use filter_all
from dplyr
library(dplyr)
B1 %>% filter_all(any_vars(!is.na(.) & . != ""))
How to delete columns that contain ONLY NAs?
One way of doing it:
df[, colSums(is.na(df)) != nrow(df)]
If the count of NAs in a column is equal to the number of rows, it must be entirely NA.
Or similarly
df[colSums(!is.na(df)) > 0]
How to loop through dataframes and delete empty columns (in R)?
We could use lapply
and filter the columns from each dataframe in the list
output <- lapply(lst, function(df)
Filter(function(x)!all(is.na(x) || is.null(x) || x == "" || x == 0),df))
Remove columns that are all NA for at least one level of a factor
Another option:
dat %>%
select(site, dat %>%
group_by(site) %>%
summarise(across(everything(), ~!all(is.na(.x))))%>%
ungroup() %>%
select(-site) %>%
select(where(all))%>%
names())
site year species_B species_C
1 A 2000 1 1
2 A 2001 2 2
3 A 2002 NA 3
4 A 2003 4 4
5 A 2004 5 5
6 B 2000 NA 2
7 B 2001 3 3
8 B 2002 4 4
9 B 2003 5 5
10 B 2004 6 6
Related Topics
How to Specify Lib Directory When Installing Development Version R Packages from Github Repository
Rolling Window Over Irregular Time Series
How to Get Parameters from Config File in R Script
Ggplot2 Avoid Boxes Around Legend Symbols
Adding Total/Subtotal to the Bottom of a Datatable in Shiny
Order and Color of Bars in Ggplot2 Barplot
How to Use Grid to Edit a Ggplot2 Object to Add Math Expressions to Facet Labels
How to Include Rmarkdown File in R Package
Dynamically Converting a List of Excel Files to CSV Files in R
R: Text Progress Bar in for Loop
Why Are Xs Added to Data Frame Variable Names When Using Read.Csv
Warning in Install.Packages:Installation of Package 'Tidyverse' Had Non-Zero Exit Status
Earliest Date for Each Id in R