How to check if any value is NaN in a Pandas DataFrame
jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:
df.isnull().values.any()
import numpy as np
import pandas as pd
import perfplot
def setup(n):
df = pd.DataFrame(np.random.randn(n))
df[df > 0.9] = np.nan
return df
def isnull_any(df):
return df.isnull().any()
def isnull_values_sum(df):
return df.isnull().values.sum() > 0
def isnull_sum(df):
return df.isnull().sum() > 0
def isnull_values_any(df):
return df.isnull().values.any()
perfplot.save(
"out.png",
setup=setup,
kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
n_range=[2 ** k for k in range(25)],
)
df.isnull().sum().sum()
is a bit slower, but of course, has additional information -- the number of NaNs
.
Checking if particular value (in cell) is NaN in pandas DataFrame not working using ix or iloc
Try this:
In [107]: pd.isnull(df.iloc[1,0])
Out[107]: True
UPDATE: in a newer Pandas versions use pd.isna():
In [7]: pd.isna(df.iloc[1,0])
Out[7]: True
Check if columns have a nan value if certain column has a specific value in Dataframe
so you have an if-elif-else situation. Then we can use np.select
for it. It needs the conditions and what to do when they are satisfied:
- your if is: "condition is 1 and a,b,c has all nan"
- your elif is: "condition is nan"
- what remains is else, as usual
conditions = [df.condition.eq(1) & df[["a", "b", "c"]].isna().all(axis=1),
df.condition.isna()]
what_to_do = ["O", "-"]
else_case = "X"
df["check_result"] = np.select(conditions, what_to_do, default=else_case)
df
condition a b c check_result
0 1.0 NaN NaN 3.0 X
1 NaN 4.0 2 2.0 -
2 NaN 5.0 e 1.0 -
3 NaN 6.0 2 2.0 -
4 1.0 NaN NaN NaN O
So we don't write else's condition. It goes to default.
Pandas - check if ALL values are NaN in Series
Yes, that's correct, but I think a more idiomatic way would be:
mys.isnull().all()
How to find which columns contain any NaN value in Pandas dataframe
UPDATE: using Pandas 0.22.0
Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'
In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [72]: df.isna().any()
Out[72]:
a True
b True
c False
dtype: bool
as list of columns:
In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']
to select those columns (containing at least one NaN
value):
In [73]: df.loc[:, df.isna().any()]
Out[73]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0
OLD answer:
Try to use isnull():
In [97]: df
Out[97]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [98]: pd.isnull(df).sum() > 0
Out[98]:
a True
b True
c False
dtype: bool
or as @root proposed clearer version:
In [5]: df.isnull().any()
Out[5]:
a True
b True
c False
dtype: bool
In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']
to select a subset - all columns containing at least one NaN
value:
In [31]: df.loc[:, df.isnull().any()]
Out[31]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0
Check if single cell value is NaN in Pandas
Try this:
import pandas as pd
import numpy as np
from pandas import *
>>> L = [4, nan ,6]
>>> df = Series(L)
>>> df
0 4
1 NaN
2 6
>>> if(pd.isnull(df[1])):
print "Found"
Found
>>> if(np.isnan(df[1])):
print "Found"
Found
How to fill dataframe Nan values with empty list [] of 4 elements in pandas?
You can't use fillna
with lists, but you can create a Series containing your list repeated for the length of the dataframe, and assign that to the b
where b
is NaN:
df.loc[df['b'].isna(), 'b'] = pd.Series([ [[]]*4 ] * len(df))
Related Topics
Finding What Methods a Python Object Has
Multiple Level Template Inheritance in Jinja2
Error When Installing Rpy2 Module in Python with Easy_Install
Python Equivalent of Ruby's Each_Slice(Count)
Extract Column Value Based on Another Column Pandas Dataframe
Python 3.7 Anaconda Environment - Import _Ssl Dll Load Fail Error
How to Build a Recursive Function in Python
Convert a Timedelta to Days, Hours and Minutes
Pyqt Showing Video Stream from Opencv
Logger Configuration to Log to File and Print to Stdout
Wtforms, Add a Class to a Form Dynamically
Install Rpy2 on Windows7 64Bit for Python 2.7
How Can One Find the Unicode Codepoints That a Font Has Glyphs For, on a Debian-Based System
Automatically Initialize Instance Variables
Applying Udfs on Groupeddata in Pyspark (With Functioning Python Example)