How to Remove Empty Data Frames from a List

How do I remove empty data frames from a list?

I'm not sure if this is exactly what you're asking for, but if you want to trim mlist down to contain only non-empty data frames before running the function on it, try mlist[sapply(mlist, function(x) dim(x)[1]) > 0].

E.g.:

R> M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
R> M2 <- data.frame(matrix(nrow = 0, ncol = 0))
R> M3 <- data.frame(matrix(9:12, nrow = 2, ncol = 2))
R> mlist <- list(M1, M2, M3)
R> mlist[sapply(mlist, function(x) dim(x)[1]) > 0]
[[1]]
X1 X2
1 1 3
2 2 4

[[2]]
X1 X2
1 9 11
2 10 12

Delete empty dataframes from a list with dataframes

We ca use filter

data = list(filter(lambda df: not df.empty, data))

or list comprehension

data = [df for df in data if not df.empty]


print(data)

[ a
0 1
1 2
2 3, a
0 3
1 4
2 5
3 6
4 7]

Delete empty dataframes from List

Simpler test data:

x <- list(data.frame(x=1:3), data.frame(x=numeric(0)), data.frame(x=1:4))

I would recommend (without a for loop):

is_empty <- function(x) (nrow(x)==0 || ncol(x) ==0)
x <- x[sapply(x, is_empty)]

? This creates a logical vector which is TRUE if the data frame is empty, and subsets the original list accordingly.

Setting an element of a list to NULL (x[[i]] <- NULL) removes it from the list, but I would worry that your indexing is going to get screwed up. It's very tricky to get this right, because the loop changes under you. For example, consider

x <- list(data.frame(x=1:3), data.frame(x=numeric(0)),
data.frame(x=numeric(0)))
for (i in 1:length(x)) {
if (nrow(x[[i]])==0) x[[i]] <- NULL
}

This gets "error in x[[i]]: subscript out of bounds", because

  1. i==1, check x[[1]]: OK, move on to next element
  2. i==2; remove the second element. Now we have a list with only two elements (the previous first and third elements)
  3. i==3; error, because x[[3]] doesn't exist!

You could do this to avoid messing up the indexing:

j <- 0  ## cumulative number removed
for (i in seq(length(x))) {
if (is_empty(x[[i-j]])) {
x[[i-j]] <- NULL
j <- j + 1
}
}

but this seems like a "code smell", i.e. you're having to work extra hard because you're doing it in an awkward way.

How to remove empty dataframes in a list before using bind_rows()?

Using @akrun data:

lst1[unlist(lapply(lst1, function(x) !(is.null(x) | is_tibble(x))))]

Regarding your question about NA:

lst1 <- list(data.frame(col1 = 1:3), NULL, tibble(col1 = 1:5, 
col2 = 2:6), data.frame(A = 1:5, B = 2:6), NA)

lst <-lst1[unlist(lapply(lst1, function(x) !(is.null(x) | is_tibble(x))))]

lst<-lst[!is.na(lst)]

remove empty dataframe from list and drop corresponding name in second list

You can use the built-in any() method:

k = [i for i, x in enumerate(dfs) if not any(x)]

The reason your

k = [i for i, x in enumerate(dfs) if not x]

doesn't work is because, regardless of what is in a list, as long as the list is not empty, the truthy value of the list will be True.

The any() method will take in an array, and return whether any of the elements in the array has a truthy value of True. If the array has no elements such, it will return False. The thruthy value of an empty string, '', is False.

EDIT: The question got edited, here is my updated answer:

You can try creating new lists:

names = ['ID1','ID2','ID3'] 
dfs = [['car','fast','blue'],[],['red','bike','slow']]

new_names = list()
new_dfs = list()

for i, x in enumerate(dfs):
if x:
new_names.append(names[i])
new_dfs.append(x)

print(new_names)
print(new_dfs)

Output:

['ID1', 'ID3']
[['car', 'fast', 'blue'], ['red', 'bike', 'slow']]

If it doesn't work, try adding a print(x) to the loop to see what is going on:

names = ['ID1','ID2','ID3'] 
dfs = [['car','fast','blue'],[],['red','bike','slow']]

new_names = list()
new_dfs = list()

for i, x in enumerate(dfs):
print(x)
if x:
new_names.append(names[i])
new_dfs.append(x)

How to delete empty data.frame in a list after subsetting in R

Simply Filter by number of rows:

new_list_of_dfs <- Filter(NROW, list_of_dfs)

How can I remove empty dataframes from a series of dataframes in pandas?

A series of DataFrames?

You can check if a dataframe is empty with df.empty, so you could do

serie[[not df.empty for df in serie]]

Remove rows with empty lists from pandas data frame

You could try slicing as though the data frame were strings instead of lists:

import pandas as pd
df = pd.DataFrame({
'donation_orgs' : [[], ['the research of Dr.']],
'donation_context': [[], ['In lieu of flowers , memorial donations']]})

df[df.astype(str)['donation_orgs'] != '[]']

Out[9]:
donation_context donation_orgs
1 [In lieu of flowers , memorial donations] [the research of Dr.]


Related Topics



Leave a reply



Submit