How to filter in NaN (pandas)?
This doesn't work because NaN
isn't equal to anything, including NaN
. Use pd.isnull(df.var2)
instead.
Python pandas Filtering out nan from a data selection of a column of strings
Just drop them:
nms.dropna(thresh=2)
this will drop all rows where there are at least two non-NaN
.Then you could then drop where name is NaN
:
In [87]:
nms
Out[87]:
movie name rating
0 thg John 3
1 thg NaN 4
3 mol Graham NaN
4 lob NaN NaN
5 lob NaN NaN
[5 rows x 3 columns]
In [89]:
nms = nms.dropna(thresh=2)
In [90]:
nms[nms.name.notnull()]
Out[90]:
movie name rating
0 thg John 3
3 mol Graham NaN
[2 rows x 3 columns]
EDITActually looking at what you originally want you can do just this without the dropna
call:
nms[nms.name.notnull()]
UPDATELooking at this question 3 years later, there is a mistake, firstly thresh
arg looks for at least n
non-NaN
values so in fact the output should be:
In [4]:
nms.dropna(thresh=2)
Out[4]:
movie name rating
0 thg John 3.0
1 thg NaN 4.0
3 mol Graham NaN
It's possible that I was either mistaken 3 years ago or that the version of pandas I was running had a bug, both scenarios are entirely possible. Filter nan values out of rows in pandas
You can use masks in pandas:
food = 'Amphipods'
mask = df[food].notnull()
result_set = df[mask]
df[food].notnull()
returns a mask (a Series of boolean values indicating if the condition is met for each row), and you can use that mask to filter the real DF using df[mask]
.Usually you can combine these two rows to have a more pythonic code, but that's up to you:
result_set = df[df[food].notnull()]
This returns a new DF with the subset of rows that meet the condition (including all columns from the original DF), so you can use other operations on this new DF (e.g selecting a subset of columns, drop other missing values, etc)See more about .notnull()
: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.notnull.html
Value filter in pandas dataframe keeping NaN
Use not operator: ~
df_chunked[~(df_chunked['value'].ge(10))]
#df_chunked[~(df_chunked['value']>=10)] #greater or equal(the same)
index value
0 1 5.0
1 2 6.0
2 3 7.0
3 4 NaN
4 5 9.0
why?Because the logical operations simply ignore NaN
values and take it as False
, always as you can see in the following data frame, then if you want to avoid using series.isna
(
avoid unnecessary additional code) and simplify your code simply use the inverse logic with ~
print(df.assign(greater_than_5 = df['value'].gt(5),
not_greater_than_5 = df['value'].le(5)))
index value greater_than_5 not_greater_than_5
0 1 5.0 False True
1 2 6.0 True False
2 3 7.0 True False
3 4 NaN False False
4 5 9.0 True False
5 6 3.0 False True
6 7 11.0 True False
7 8 34.0 True False
8 9 78.0 True False
Filter out nan rows in a specific column
you can use DataFrame.dropna()
method:
In [202]: df.dropna(subset=['Col2'])
Out[202]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
or (in this case) less idiomatic Series.notnull():In [204]: df.loc[df.Col2.notnull()]
Out[204]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
or using DataFrame.query() method:In [205]: df.query("Col2 == Col2")
Out[205]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
numexpr
solution:In [241]: import numexpr as ne
In [242]: col = df.Col2
In [243]: df[ne.evaluate("col == col")]
Out[243]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
How to filter for NaN when it is a float type
df[df.A1.isnull()]
For Pandas DataFrame, this code works, but change the column names to start with an alphabet. Also remove the square bracket around the column name, A1
Pandas array filter NaN and keep the first value in group
Take only values that aren't nan, but the value before them is nan:
df = df[df.col1.notna() & df.col1.shift().isna()]
Output: col1
27 357.0
247 357.0
304 58.0
334 237.0
Assuming all values are greater than 0, we could also do:
df = df.fillna(0).diff()
df = df[df.col1.gt(0)]
how to filter dataframe with removing NULL values from selected rows in python?
Select only dept
and room
columns, replace possible strings NA
to NaN
s and remove missing columns:
df= df[df["dept"].isin(selected_dept)].filter(regex='room|dept').replace('NA', np.nan).dropna(axis=1)
Or:df= df[df["dept"].isin(selected_dept)].drop('count', axis=1).replace('NA', np.nan).dropna(axis=1)
Pandas Boolean Filter with Assignment resulting in NaN
The problem is that the indexes don't match. You can get around that issue by using the underlying numpy array:
msk = (df['Period'] == '24 hr')
cols = ['DPM', 'NOx']
df.loc[~msk & df['Source'].isin(['A','B']), cols] = df.loc[msk & df['Source'].isin(['A','B']), cols].to_numpy()
Output: Source Period CO DPM NOx
0 A 1 hr 1.1 12.1 22.1
1 B 1 hr 1.2 12.2 22.2
2 C 1 hr 1.3 11.3 21.3
3 A 24 hr 2.1 12.1 22.1
4 B 24 hr 2.2 12.2 22.2
5 C 24 hr 2.3 12.3 22.3
Note that this only works as you expect if there is a one-to-one relation between "1 hr" and "24 hr" for each "Source" type.You could also use groupby
+ last
:
cols = ['DPM', 'NOx']
filt = df['Source'].isin(['A','B'])
df.loc[filt, cols] = df[filt].groupby('Source')[cols].transform('last')
Related Topics
Django/Python Beginner: Error When Executing Python Manage.Py Syncdb - Psycopg2 Not Found
Python - Datetime with Timezone to Epoch
What Does 'Wb' Mean in This Code, Using Python
How to Calculate the Inverse of the Normal Cumulative Distribution Function in Python
Does Python Evaluate If's Conditions Lazily
How to Modify Procfile to Run Gunicorn Process in a Non-Standard Folder on Heroku
Asyncio Cancellederror and Keyboardinterrupt
Is 'Import Module' Better Coding Style Than 'From Module Import Function'
How to Wrap a String in a File in Python
Lookup Values by Corresponding Column Header in Pandas 1.2.0 or Newer
Django: How to Manage Development and Production Settings
How to Delete Specific Strings from a File
Matplotlib Custom Marker/Symbol
How to Find the Maximum Value in a List of Tuples
Pandas Groupby and Sum Only One Column
Overflowerror: Long Int Too Large to Convert to Float in Python