How to Check If Any Value Is Nan in a Pandas Dataframe

How to check if any value is NaN in a Pandas DataFrame

jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any()

Sample Image

import numpy as np
import pandas as pd
import perfplot

def setup(n):
df = pd.DataFrame(np.random.randn(n))
df[df > 0.9] = np.nan
return df

def isnull_any(df):
return df.isnull().any()

def isnull_values_sum(df):
return df.isnull().values.sum() > 0

def isnull_sum(df):
return df.isnull().sum() > 0

def isnull_values_any(df):
return df.isnull().values.any()

perfplot.save(
"out.png",
setup=setup,
kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
n_range=[2 ** k for k in range(25)],
)

df.isnull().sum().sum() is a bit slower, but of course, has additional information -- the number of NaNs.

Checking if particular value (in cell) is NaN in pandas DataFrame not working using ix or iloc

Try this:

In [107]: pd.isnull(df.iloc[1,0])
Out[107]: True

UPDATE: in a newer Pandas versions use pd.isna():

In [7]: pd.isna(df.iloc[1,0])
Out[7]: True

Check if columns have a nan value if certain column has a specific value in Dataframe

so you have an if-elif-else situation. Then we can use np.select for it. It needs the conditions and what to do when they are satisfied:

  • your if is:    "condition is 1 and a,b,c has all nan"
  • your elif is: "condition is nan"
  • what remains is else, as usual
conditions = [df.condition.eq(1) & df[["a", "b", "c"]].isna().all(axis=1),
df.condition.isna()]

what_to_do = ["O", "-"]
else_case = "X"

df["check_result"] = np.select(conditions, what_to_do, default=else_case)

df
   condition    a    b    c check_result
0 1.0 NaN NaN 3.0 X
1 NaN 4.0 2 2.0 -
2 NaN 5.0 e 1.0 -
3 NaN 6.0 2 2.0 -
4 1.0 NaN NaN NaN O

So we don't write else's condition. It goes to default.

Pandas - check if ALL values are NaN in Series

Yes, that's correct, but I think a more idiomatic way would be:

mys.isnull().all()

How to find which columns contain any NaN value in Pandas dataframe

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

In [72]: df.isna().any()
Out[72]:
a True
b True
c False
dtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0

OLD answer:

Try to use isnull():

In [97]: df
Out[97]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a True
b True
c False
dtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()
Out[5]:
a True
b True
c False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0

Check if single cell value is NaN in Pandas

Try this:

import pandas as pd
import numpy as np
from pandas import *

>>> L = [4, nan ,6]
>>> df = Series(L)

>>> df
0 4
1 NaN
2 6

>>> if(pd.isnull(df[1])):
print "Found"

Found

>>> if(np.isnan(df[1])):
print "Found"

Found

How to fill dataframe Nan values with empty list [] of 4 elements in pandas?

You can't use fillna with lists, but you can create a Series containing your list repeated for the length of the dataframe, and assign that to the b where b is NaN:

df.loc[df['b'].isna(), 'b'] = pd.Series([ [[]]*4 ] * len(df))


Related Topics



Leave a reply



Submit