How to Select Rows with One or More Nulls from a Pandas Dataframe Without Listing Columns Explicitly

Display rows with one or more NaN values in pandas dataframe

You can use DataFrame.any with parameter axis=1 for check at least one True in row by DataFrame.isna with boolean indexing:

df1 = df[df.isna().any(axis=1)]

d = {'filename': ['M66_MI_NSRh35d32kpoints.dat', 'F71_sMI_DMRI51d.dat', 'F62_sMI_St22d7.dat', 'F41_Car_HOC498d.dat', 'F78_MI_547d.dat'], 'alpha1': [0.8016, 0.0, 1.721, 1.167, 1.897], 'alpha2': [0.9283, 0.0, 3.833, 2.809, 5.459], 'gamma1': [1.0, np.nan, 0.23748000000000002, 0.36419, 0.095319], 'gamma2': [0.074804, 0.0, 0.15, 0.3, np.nan], 'chi2min': [39.855990000000006, 1e+25, 10.91832, 7.966335000000001, 25.93468]}
df = pd.DataFrame(d).set_index('filename')

print (df)
alpha1 alpha2 gamma1 gamma2 chi2min
filename
M66_MI_NSRh35d32kpoints.dat 0.8016 0.9283 1.000000 0.074804 3.985599e+01
F71_sMI_DMRI51d.dat 0.0000 0.0000 NaN 0.000000 1.000000e+25
F62_sMI_St22d7.dat 1.7210 3.8330 0.237480 0.150000 1.091832e+01
F41_Car_HOC498d.dat 1.1670 2.8090 0.364190 0.300000 7.966335e+00
F78_MI_547d.dat 1.8970 5.4590 0.095319 NaN 2.593468e+01

Explanation:

print (df.isna())
alpha1 alpha2 gamma1 gamma2 chi2min
filename
M66_MI_NSRh35d32kpoints.dat False False False False False
F71_sMI_DMRI51d.dat False False True False False
F62_sMI_St22d7.dat False False False False False
F41_Car_HOC498d.dat False False False False False
F78_MI_547d.dat False False False True False

print (df.isna().any(axis=1))
filename
M66_MI_NSRh35d32kpoints.dat False
F71_sMI_DMRI51d.dat True
F62_sMI_St22d7.dat False
F41_Car_HOC498d.dat False
F78_MI_547d.dat True
dtype: bool

df1 = df[df.isna().any(axis=1)]
print (df1)
alpha1 alpha2 gamma1 gamma2 chi2min
filename
F71_sMI_DMRI51d.dat 0.000 0.000 NaN 0.0 1.000000e+25
F78_MI_547d.dat 1.897 5.459 0.095319 NaN 2.593468e+01

Select rows of a dataframe where at least one column is NaN

Use isnull with any

df[df.isnull().any(1)]
Out[122]:
columnA columnB
0 NaN 1.0
2 NaN NaN
3 1.0 NaN

filter dataframe using isna() to filter ourt rows that have null value in following columns

You can filter if exist at least one non missing values after second columns with DataFrame.notna and DataFrame.any:

df = df[df.iloc[:,2:].notna().any(axis=1)]
print (df)
id name val1_rain val2_tik val3_bon val4_tig
0 2349 Rivi 0.11 0.34 0.78 0.21
2 835 Pigi 0.34 NaN 0.32 NaN
3 5093 Tari 0.65 0.12 0.34 2.45

Select data when specific columns have null value in pandas

Use boolean indexing:

mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
ID Date1 Date2
0 58844880.0 04/11/16 NaN
2 59743311.0 04/13/16 NaN
4 59598413.0 NaN NaN
8 59561198.0 NaN 04/17/16

Timings:

#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop

In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop

select rows based on a combination of strings without order in strings

Might as well just make separate masks for both words in this case. If you have a longer list of words, there are better solutions.

df_ = df_[df_['summary'].str.contains('slow') & df_['summary'].str.contains('delivery')]


Related Topics



Leave a reply



Submit