Determine If Data Frame Is Empty

How to check whether a pandas DataFrame is empty?

You can use the attribute df.empty to check whether it's empty or not:

if df.empty:
print('DataFrame is empty!')

Source: Pandas Documentation

How to check whether a DataFrame is empty?

IIUC: there is .empty attribute:

DataFrame:

In [86]: pd.DataFrame().empty
Out[86]: True

In [87]: pd.DataFrame([1,2,3]).empty
Out[87]: False

Series:

In [88]: pd.Series().empty
Out[88]: True

In [89]: pd.Series([1,2,3]).empty
Out[89]: False

NOTE: checking the length of DF (len(df)) might save you a few milliseconds compared to df.empty method ;-)

In [142]: df = pd.DataFrame()

In [143]: %timeit df.empty
8.25 µs ± 22.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [144]: %timeit len(df)
2.35 µs ± 7.56 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [145]: df = pd.DataFrame(np.random.randn(10*5, 3), columns=['a', 'b', 'c'])

In [146]: %timeit df.empty
15.3 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [147]: %timeit len(df)
3.58 µs ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Check if Dataframe is empty and print results

You need to change the if to:

if df.empty == True:

or

if df.empty:

Find empty or NaN entry in Pandas Dataframe

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

Is there a way to check if dataframe is empty and if so to add a NA row?

Simple question, OP, but actually pretty interesting. All the elements of your code should work, but the issue is that when you run as is, it will return a list, not a data frame. Let me show you with an example:

growing_df <- data.frame(
A=rep(1, 3),
B=1:3,
c=LETTERS[4:6])

df_empty <- data.frame()

If we evaluate as you have written you get:

df <- ifelse(dim(df_empty)[1]==0, rbind(growing_df, NA))

with df resulting in a List:

> class(df)
[1] "list"
> df
[[1]]
[1] 1 1 1 NA

The code "worked", but the resulting class of df is wrong. It's odd because this works:

> rbind(growing_df, NA)
A B c
1 1 1 D
2 1 2 E
3 1 3 F
4 NA NA <NA>

The answer is to use if and else, rather than ifelse(), just as @akrun noted in their answer. The reason is found if you dig into the documentation of ifelse():

ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.

Since dim(df_empty)[1] and/or nrow(df_empty) are both vectors, the result will be saved as a list. That's why if {} works, but not ifelse() here. rbind() results in a data frame normally, but the class of the result stored into df when assigning with ifelse() is decided based on the test element, not the resulting element. Compare that to if{} statements, which have a result element decided based on whatever expression is input into {}.

Fastest way to check if dataframe is empty

Suppose we have two types of data.frames:

emptyDF = data.frame(a=1,b="bah")[0,]
fullDF = data.frame(a=1,b="bah")

DFs = list(emptyDF,fullDF)[sample(1:2,1e4,replace=TRUE)]

and your if condition shows up in a loop like

boundDF = data.frame()
for (i in seq_along(DFs)){ if (nrow(DFs[[i]]))
boundDF <- rbind(boundDF,DFs[[i]])
}

In this case, you're approaching the problem in the wrong way. The if statement is not necessary: do.call(rbind,DFs) or library(data.table); rbindlist(DFs) is faster and clearer.

Generally, you are looking for improvement to the performance of your code in the wrong place. No matter what operation you're doing inside your loop, the step of checking for non-emptiness of the data.frame is not going to be the part that is taking the most time. While there may be room for optimization in this step, "Premature optimization is the root of all evil" as Donald Knuth said.



Related Topics



Leave a reply



Submit