How to Filter in Nan (Pandas)

How to filter in NaN (pandas)?

This doesn't work because NaN isn't equal to anything, including NaN. Use pd.isnull(df.var2) instead.

Python pandas Filtering out nan from a data selection of a column of strings

Just drop them:

nms.dropna(thresh=2)

this will drop all rows where there are at least two non-NaN.

Then you could then drop where name is NaN:

In [87]:

nms
Out[87]:
  movie    name  rating
0   thg    John       3
1   thg     NaN       4
3   mol  Graham     NaN
4   lob     NaN     NaN
5   lob     NaN     NaN

[5 rows x 3 columns]
In [89]:

nms = nms.dropna(thresh=2)
In [90]:

nms[nms.name.notnull()]
Out[90]:
  movie    name  rating
0   thg    John       3
3   mol  Graham     NaN

[2 rows x 3 columns]

EDIT

Actually looking at what you originally want you can do just this without the dropna call:

nms[nms.name.notnull()]

UPDATE

Looking at this question 3 years later, there is a mistake, firstly thresh arg looks for at least n non-NaN values so in fact the output should be:

In [4]:
nms.dropna(thresh=2)

Out[4]:
  movie    name  rating
0   thg    John     3.0
1   thg     NaN     4.0
3   mol  Graham     NaN

It's possible that I was either mistaken 3 years ago or that the version of pandas I was running had a bug, both scenarios are entirely possible.

Filter nan values out of rows in pandas

You can use masks in pandas:

food = 'Amphipods'
mask = df[food].notnull()
result_set = df[mask]

df[food].notnull() returns a mask (a Series of boolean values indicating if the condition is met for each row), and you can use that mask to filter the real DF using df[mask].

Usually you can combine these two rows to have a more pythonic code, but that's up to you:

result_set = df[df[food].notnull()]

This returns a new DF with the subset of rows that meet the condition (including all columns from the original DF), so you can use other operations on this new DF (e.g selecting a subset of columns, drop other missing values, etc)

See more about .notnull(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.notnull.html

Value filter in pandas dataframe keeping NaN

Use not operator: ~

df_chunked[~(df_chunked['value'].ge(10))]
#df_chunked[~(df_chunked['value']>=10)] #greater or equal(the same)

   index  value
0      1    5.0
1      2    6.0
2      3    7.0
3      4    NaN
4      5    9.0

why?

Because the logical operations simply ignore NaN values and take it as False, always as you can see in the following data frame, then if you want to avoid using series.isna (
avoid unnecessary additional code) and simplify your code simply use the inverse logic with ~

print(df.assign(greater_than_5 = df['value'].gt(5),
          not_greater_than_5 = df['value'].le(5)))

   index  value  greater_than_5  not_greater_than_5
0      1    5.0           False                True
1      2    6.0            True               False
2      3    7.0            True               False
3      4    NaN           False               False
4      5    9.0            True               False
5      6    3.0           False                True
6      7   11.0            True               False
7      8   34.0            True               False
8      9   78.0            True               False

Filter out nan rows in a specific column

you can use DataFrame.dropna() method:

In [202]: df.dropna(subset=['Col2'])
Out[202]:
   Col1  Col2  Col3
1     2   5.0   4.0
2     3   3.0   NaN

or (in this case) less idiomatic Series.notnull():

In [204]: df.loc[df.Col2.notnull()]
Out[204]:
   Col1  Col2  Col3
1     2   5.0   4.0
2     3   3.0   NaN

or using DataFrame.query() method:

In [205]: df.query("Col2 == Col2")
Out[205]:
   Col1  Col2  Col3
1     2   5.0   4.0
2     3   3.0   NaN

numexpr solution:

In [241]: import numexpr as ne

In [242]: col = df.Col2

In [243]: df[ne.evaluate("col == col")]
Out[243]:
   Col1  Col2  Col3
1     2   5.0   4.0
2     3   3.0   NaN

How to filter for NaN when it is a float type

df[df.A1.isnull()]

For Pandas DataFrame, this code works, but change the column names to start with an alphabet. Also remove the square bracket around the column name, A1

Pandas array filter NaN and keep the first value in group

Take only values that aren't nan, but the value before them is nan:

df = df[df.col1.notna() & df.col1.shift().isna()]

Output:

Assuming all values are greater than 0, we could also do:

df = df.fillna(0).diff()
df = df[df.col1.gt(0)]

how to filter dataframe with removing NULL values from selected rows in python?

Select only dept and room columns, replace possible strings NA to NaNs and remove missing columns:

df= df[df["dept"].isin(selected_dept)].filter(regex='room|dept').replace('NA', np.nan).dropna(axis=1)

Or:

df= df[df["dept"].isin(selected_dept)].drop('count', axis=1).replace('NA', np.nan).dropna(axis=1)

Pandas Boolean Filter with Assignment resulting in NaN

The problem is that the indexes don't match. You can get around that issue by using the underlying numpy array:

msk = (df['Period'] == '24 hr')
cols = ['DPM', 'NOx']
df.loc[~msk & df['Source'].isin(['A','B']), cols] = df.loc[msk & df['Source'].isin(['A','B']), cols].to_numpy()

Output:

  Source Period   CO   DPM   NOx
0      A   1 hr  1.1  12.1  22.1
1      B   1 hr  1.2  12.2  22.2
2      C   1 hr  1.3  11.3  21.3
3      A  24 hr  2.1  12.1  22.1
4      B  24 hr  2.2  12.2  22.2
5      C  24 hr  2.3  12.3  22.3

Note that this only works as you expect if there is a one-to-one relation between "1 hr" and "24 hr" for each "Source" type.

You could also use groupby + last:

cols = ['DPM', 'NOx']
filt = df['Source'].isin(['A','B'])
df.loc[filt, cols] = df[filt].groupby('Source')[cols].transform('last')

How to Filter in Nan (Pandas)