Display rows with one or more NaN values in pandas dataframe
You can use DataFrame.any
with parameter axis=1
for check at least one True
in row by DataFrame.isna
with boolean indexing
:
df1 = df[df.isna().any(axis=1)]
d = {'filename': ['M66_MI_NSRh35d32kpoints.dat', 'F71_sMI_DMRI51d.dat', 'F62_sMI_St22d7.dat', 'F41_Car_HOC498d.dat', 'F78_MI_547d.dat'], 'alpha1': [0.8016, 0.0, 1.721, 1.167, 1.897], 'alpha2': [0.9283, 0.0, 3.833, 2.809, 5.459], 'gamma1': [1.0, np.nan, 0.23748000000000002, 0.36419, 0.095319], 'gamma2': [0.074804, 0.0, 0.15, 0.3, np.nan], 'chi2min': [39.855990000000006, 1e+25, 10.91832, 7.966335000000001, 25.93468]}
df = pd.DataFrame(d).set_index('filename')
print (df)
alpha1 alpha2 gamma1 gamma2 chi2min
filename
M66_MI_NSRh35d32kpoints.dat 0.8016 0.9283 1.000000 0.074804 3.985599e+01
F71_sMI_DMRI51d.dat 0.0000 0.0000 NaN 0.000000 1.000000e+25
F62_sMI_St22d7.dat 1.7210 3.8330 0.237480 0.150000 1.091832e+01
F41_Car_HOC498d.dat 1.1670 2.8090 0.364190 0.300000 7.966335e+00
F78_MI_547d.dat 1.8970 5.4590 0.095319 NaN 2.593468e+01
Explanation:
print (df.isna())
alpha1 alpha2 gamma1 gamma2 chi2min
filename
M66_MI_NSRh35d32kpoints.dat False False False False False
F71_sMI_DMRI51d.dat False False True False False
F62_sMI_St22d7.dat False False False False False
F41_Car_HOC498d.dat False False False False False
F78_MI_547d.dat False False False True False
print (df.isna().any(axis=1))
filename
M66_MI_NSRh35d32kpoints.dat False
F71_sMI_DMRI51d.dat True
F62_sMI_St22d7.dat False
F41_Car_HOC498d.dat False
F78_MI_547d.dat True
dtype: bool
df1 = df[df.isna().any(axis=1)]
print (df1)
alpha1 alpha2 gamma1 gamma2 chi2min
filename
F71_sMI_DMRI51d.dat 0.000 0.000 NaN 0.0 1.000000e+25
F78_MI_547d.dat 1.897 5.459 0.095319 NaN 2.593468e+01
Select rows of a dataframe where at least one column is NaN
Use isnull
with any
df[df.isnull().any(1)]
Out[122]:
columnA columnB
0 NaN 1.0
2 NaN NaN
3 1.0 NaN
filter dataframe using isna() to filter ourt rows that have null value in following columns
You can filter if exist at least one non missing values after second columns with DataFrame.notna
and DataFrame.any
:
df = df[df.iloc[:,2:].notna().any(axis=1)]
print (df)
id name val1_rain val2_tik val3_bon val4_tig
0 2349 Rivi 0.11 0.34 0.78 0.21
2 835 Pigi 0.34 NaN 0.32 NaN
3 5093 Tari 0.65 0.12 0.34 2.45
Select data when specific columns have null value in pandas
Use boolean indexing
:
mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
ID Date1 Date2
0 58844880.0 04/11/16 NaN
2 59743311.0 04/13/16 NaN
4 59598413.0 NaN NaN
8 59561198.0 NaN 04/17/16
Timings:
#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop
In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop
select rows based on a combination of strings without order in strings
Might as well just make separate masks for both words in this case. If you have a longer list of words, there are better solutions.
df_ = df_[df_['summary'].str.contains('slow') & df_['summary'].str.contains('delivery')]
Related Topics
Anaconda/Conda - Install a Specific Package Version
Gunicorn Autoreload on Source Change
How to Draw a Line with Matplotlib
Too Many Values to Unpack Calling Cv2.Findcontours
Using Monotonically_Increasing_Id() for Assigning Row Number to Pyspark Dataframe
Writing to a File in a for Loop Only Writes the Last Value
Changing User Agent in Python 3 for Urrlib.Request.Urlopen
How to Login to Django Using Tastypie
Why Is the Time Complexity of Python's List.Append() Method O(1)
Pip Install Gives Error: Unable to Find Vcvarsall.Bat
Convert Year/Month/Day to Day of Year in Python
Python Parse Comma-Separated Number into Int
Tkinter Grid_Forget Is Clearing the Frame
Exponentials in Python: X**Y VS Math.Pow(X, Y)