How to Filter in Nan (Pandas)

How to filter in NaN (pandas)?

This doesn't work because NaN isn't equal to anything, including NaN. Use pd.isnull(df.var2) instead.

Python pandas Filtering out nan from a data selection of a column of strings

Just drop them:

nms.dropna(thresh=2)

this will drop all rows where there are at least two non-NaN.

Then you could then drop where name is NaN:

In [87]:

nms
Out[87]:
movie name rating
0 thg John 3
1 thg NaN 4
3 mol Graham NaN
4 lob NaN NaN
5 lob NaN NaN

[5 rows x 3 columns]
In [89]:

nms = nms.dropna(thresh=2)
In [90]:

nms[nms.name.notnull()]
Out[90]:
movie name rating
0 thg John 3
3 mol Graham NaN

[2 rows x 3 columns]

EDIT

Actually looking at what you originally want you can do just this without the dropna call:

nms[nms.name.notnull()]

UPDATE

Looking at this question 3 years later, there is a mistake, firstly thresh arg looks for at least n non-NaN values so in fact the output should be:

In [4]:
nms.dropna(thresh=2)

Out[4]:
movie name rating
0 thg John 3.0
1 thg NaN 4.0
3 mol Graham NaN

It's possible that I was either mistaken 3 years ago or that the version of pandas I was running had a bug, both scenarios are entirely possible.

Filter nan values out of rows in pandas

You can use masks in pandas:

food = 'Amphipods'
mask = df[food].notnull()
result_set = df[mask]

df[food].notnull() returns a mask (a Series of boolean values indicating if the condition is met for each row), and you can use that mask to filter the real DF using df[mask].

Usually you can combine these two rows to have a more pythonic code, but that's up to you:

result_set = df[df[food].notnull()]

This returns a new DF with the subset of rows that meet the condition (including all columns from the original DF), so you can use other operations on this new DF (e.g selecting a subset of columns, drop other missing values, etc)

See more about .notnull(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.notnull.html

Value filter in pandas dataframe keeping NaN

Use not operator: ~

df_chunked[~(df_chunked['value'].ge(10))]
#df_chunked[~(df_chunked['value']>=10)] #greater or equal(the same)

index value
0 1 5.0
1 2 6.0
2 3 7.0
3 4 NaN
4 5 9.0

why?

Because the logical operations simply ignore NaN values and take it as False, always as you can see in the following data frame, then if you want to avoid using series.isna (
avoid unnecessary additional code)
and simplify your code simply use the inverse logic with ~

print(df.assign(greater_than_5 = df['value'].gt(5),
not_greater_than_5 = df['value'].le(5)))

index value greater_than_5 not_greater_than_5
0 1 5.0 False True
1 2 6.0 True False
2 3 7.0 True False
3 4 NaN False False
4 5 9.0 True False
5 6 3.0 False True
6 7 11.0 True False
7 8 34.0 True False
8 9 78.0 True False

Filter out nan rows in a specific column

you can use DataFrame.dropna() method:

In [202]: df.dropna(subset=['Col2'])
Out[202]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN

or (in this case) less idiomatic Series.notnull():

In [204]: df.loc[df.Col2.notnull()]
Out[204]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN

or using DataFrame.query() method:

In [205]: df.query("Col2 == Col2")
Out[205]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN

numexpr solution:

In [241]: import numexpr as ne

In [242]: col = df.Col2

In [243]: df[ne.evaluate("col == col")]
Out[243]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN

How to filter for NaN when it is a float type

df[df.A1.isnull()]

For Pandas DataFrame, this code works, but change the column names to start with an alphabet. Also remove the square bracket around the column name, A1

Pandas array filter NaN and keep the first value in group

Take only values that aren't nan, but the value before them is nan:

df = df[df.col1.notna() & df.col1.shift().isna()]

Output:

      col1
27 357.0
247 357.0
304 58.0
334 237.0

Assuming all values are greater than 0, we could also do:

df = df.fillna(0).diff()
df = df[df.col1.gt(0)]

how to filter dataframe with removing NULL values from selected rows in python?

Select only dept and room columns, replace possible strings NA to NaNs and remove missing columns:

df= df[df["dept"].isin(selected_dept)].filter(regex='room|dept').replace('NA', np.nan).dropna(axis=1)

Or:

df= df[df["dept"].isin(selected_dept)].drop('count', axis=1).replace('NA', np.nan).dropna(axis=1)

Pandas Boolean Filter with Assignment resulting in NaN

The problem is that the indexes don't match. You can get around that issue by using the underlying numpy array:

msk = (df['Period'] == '24 hr')
cols = ['DPM', 'NOx']
df.loc[~msk & df['Source'].isin(['A','B']), cols] = df.loc[msk & df['Source'].isin(['A','B']), cols].to_numpy()

Output:

  Source Period   CO   DPM   NOx
0 A 1 hr 1.1 12.1 22.1
1 B 1 hr 1.2 12.2 22.2
2 C 1 hr 1.3 11.3 21.3
3 A 24 hr 2.1 12.1 22.1
4 B 24 hr 2.2 12.2 22.2
5 C 24 hr 2.3 12.3 22.3

Note that this only works as you expect if there is a one-to-one relation between "1 hr" and "24 hr" for each "Source" type.

You could also use groupby + last:

cols = ['DPM', 'NOx']
filt = df['Source'].isin(['A','B'])
df.loc[filt, cols] = df[filt].groupby('Source')[cols].transform('last')


Related Topics



Leave a reply



Submit