R: Remove Multiple Empty Columns of Character Variables

R: Remove multiple empty columns of character variables

If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.

Sample data:

mydf <- data.frame(
A = c("a", "b"),
B = c("y", ""),
C = c("", ""),
D = c("", ""),
E = c("", "z")
)
mydf
# A B C D E
# 1 a y
# 2 b z

Identifying and removing the "empty" columns.

mydf[!sapply(mydf, function(x) all(x == ""))]
# A B E
# 1 a y
# 2 b z

Alternatively, as recommended by @Roland:

> mydf[, colSums(mydf != "") != 0]
A B E
1 a y
2 b z

Remove Multiple Empty Columns for String

One option using base R apply is to first calculate number of columns which are going to be present in the final dataframe (cols). Filter empty values from each row and insert empty values using rep.

cols <- max(rowSums(df != ""))

as.data.frame(t(apply(df, 1, function(x) {
vals <- x[x != ""]
c(vals, rep("", cols - length(vals)))
})))

# V1 V2 V3
#1 aaa ccc
#2 aaa bbb
#3 bbb ccc ddd

Another option with gather/spread would be to add a new column for row number convert it to long format using gather, filter the non-empty values, group_by every row and give new column names using paste0 and finally convert it to wide format using spread.

library(dplyr)
library(tidyr)

df %>%
mutate(row = row_number()) %>%
gather(key, value, -row) %>%
filter(value != "") %>%
group_by(row) %>%
mutate(key = paste0("new", row_number())) %>%
spread(key, value, fill = "") %>%
ungroup() %>%
select(-row)

# new1 new2 new3
# <chr> <chr> <chr>
#1 aaa ccc ""
#2 aaa bbb ""
#3 bbb ccc ddd

Remove columns from dataframe where ALL values are NA, NULL or empty

We can use Filter

Filter(function(x) !(all(x=="")), df)
# Var1 Var3
#1 2R+ 52
#2 2R+ 169
#3 2R+ 83
#4 2R+ 98
#5 2R+ NA
#6 2R+ 111
#7 2R+ 94
#8 2R+ 116
#9 2R+ 86

NOTE: It should also work if all the elements are NA for a particular column

df$Var3 <- NA
Filter(function(x) !(all(x=="")), df)
# Var1
#1 2R+
#2 2R+
#3 2R+
#4 2R+
#5 2R+
#6 2R+
#7 2R+
#8 2R+
#9 2R+

Update

Based on the updated dataset, if we need to remove the columns with only 0 values, then change the code to

Filter(function(x) !(all(x==""|x==0)), df2)
# VAR1 VAR3 VAR4 VAR7
#1 2R+ 52 1.05 30
#2 2R+ 169 1.02 40
#3 2R+ 83 NA 40
#4 2R+ 98 1.16 40
#5 2R+ 154 1.11 40
#6 2R+ 111 NA 15

data

df2 <- structure(list(VAR1 = c("2R+", "2R+", "2R+", "2R+", "2R+", "2R+"
), VAR2 = c("", "", "", "", "", ""), VAR3 = c(52L, 169L, 83L,
98L, 154L, 111L), VAR4 = c(1.05, 1.02, NA, 1.16, 1.11, NA), VAR5 = c(0L,
0L, 0L, 0L, 0L, 0L), VAR6 = c(0L, 0L, 0L, 0L, 0L, 0L), VAR7 = c(30L,
40L, 40L, 40L, 40L, 15L)), .Names = c("VAR1", "VAR2", "VAR3",
"VAR4", "VAR5", "VAR6", "VAR7"), row.names = c("1", "2", "3",
"4", "5", "6"), class = "data.frame")

Piping the removal of empty columns using dplyr

We can use a version of select_if

library(dplyr)
df %>%
select_if(function(x) !(all(is.na(x)) | all(x=="")))

# id Q2 Q3 Q4
#1 1 1 NA
#2 2 2
#3 3 4 3 2
#4 4 5 4 2

Or without using an anonymous function call

df %>% select_if(~!(all(is.na(.)) | all(. == "")))

You can also modify your apply statement as

df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]

Or using colSums

df[colSums(is.na(df) | df == "") != nrow(df)]

and inverse

df[colSums(!(is.na(df) | df == "")) > 0]

Removing all empty columns and rows in data.frame when rows don't go away

You have NA and also empty rows. You can do

B1[rowSums(is.na(B1) | B1 == "") != ncol(B1), ]

# study.name group.name outcome ESL prof scope type
#1 Shin.Ellis ME.short 1 1 2 1 1
#2 Shin.Ellis ME.long 1 1 2 1 1
#3 Shin.Ellis DCF.short 1 1 2 1 2
#4 Shin.Ellis DCF.long 1 1 2 1 2
#5 Shin.Ellis Cont.short 1 1 2 NA NA
#6 Shin.Ellis Cont.long 1 1 2 NA NA
#8 Trus.Hsu Exper 1 2 2 2 1
#.....

We can also use filter_all from dplyr

library(dplyr)
B1 %>% filter_all(any_vars(!is.na(.) & . != ""))

How to delete columns that contain ONLY NAs?

One way of doing it:

df[, colSums(is.na(df)) != nrow(df)]

If the count of NAs in a column is equal to the number of rows, it must be entirely NA.

Or similarly

df[colSums(!is.na(df)) > 0]

How to loop through dataframes and delete empty columns (in R)?

We could use lapply and filter the columns from each dataframe in the list

output <- lapply(lst, function(df) 
Filter(function(x)!all(is.na(x) || is.null(x) || x == "" || x == 0),df))

Remove columns that are all NA for at least one level of a factor

Another option:

 dat %>%
select(site, dat %>%
group_by(site) %>%
summarise(across(everything(), ~!all(is.na(.x))))%>%
ungroup() %>%
select(-site) %>%
select(where(all))%>%
names())

site year species_B species_C
1 A 2000 1 1
2 A 2001 2 2
3 A 2002 NA 3
4 A 2003 4 4
5 A 2004 5 5
6 B 2000 NA 2
7 B 2001 3 3
8 B 2002 4 4
9 B 2003 5 5
10 B 2004 6 6


Related Topics



Leave a reply



Submit