Pandas Groupby with Categorical Columns returns NaN
There are all possible combinations of categories, unused categories create missing values, check this.
So if need remove mising values:
print(df.groupby(["a_bin", "b_bin"]).c.mean().dropna())
a_bin b_bin
(0.0, 0.0101] (0.0, 0.0101] 0.381681
(0.0505, 0.0606] 0.148762
(0.0909, 0.101] 0.313093
(0.101, 0.111] 0.488104
(0.313, 0.323] 0.518599
(0.99, 1.0] (0.505, 0.515] 0.149027
(0.576, 0.586] 0.099652
(0.778, 0.788] 0.220360
(0.828, 0.838] 0.166424
(0.97, 0.98] 0.516558
Name: c, Length: 948, dtype: float64
How to impute missing values with groupby if the group has less than 3 nan
Try slightly modify your solution
df_daily_grouped['price'].transform(lambda x : x.fillna(x.mean()) if x.isnull().sum()<3 else x)
Pandas Grouping by Id and getting non-NaN values
This should do what you what:
df.groupby('salesforce_id').first().reset_index(drop=True)
That will merge all the columns into one, keeping only the non-NaN value for each run (unless there are no non-NaN values in all the columns for that row; then the value in the final merged column will be NaN).
Take min and max with null values - pandas groupby
IIUC,DataFrame.mask
to set NaN where there are any nan for each group and col
new_df = \
df.groupby('id')\
.agg({'start':'min', 'end':'max'})\
.mask(df[['start', 'end']].isna()
.groupby(df['id'])
.max())\
.reset_index()
print(new_df)
id start end
0 a 2020-01-01 00:00:00 2020-01-02
1 b 2020-01-01 18:37:00 NaT
2 c 2020-02-04 00:00:00 2020-07-13
3 d 2020-04-19 20:45:00 2021-03-02
Detail:
print(df[['start', 'end']].isna()
.groupby(df['id'])
.max())
start end
id
a False False
b False True
c False False
d False False
In the case of multiple columns to group by:
new_df = \
df.groupby(['id', 'status'])\
.agg({'start':'min', 'end':'max'})\
.mask(df[['start', 'end']].isna()
.groupby([df['id'], df['status']])
.max())\
.reset_index()
Pandas groupby NaN/None values in non-key columns
last
is designed to get the last non-NA value, independently in each column.
What you want (last row per group) is tail
:
df.groupby(by='a', as_index=False).tail(1)
Output:
a b c
2 1 NaN z
3 2 12.0 None
How to groupby df according to two column values and handling missing values in pandas?
IIUC, you want:
groupby
the ID and MODE columns and interpolate all numeric columnsgroupby
the ID and MODE columns and ffill all non-numeric columns
import numpy as np
#replace string "NaN" with numpy.nan
df = df.replace("NaN", np.nan)
numeric = df.filter(like="Signal").select_dtypes(np.number).columns
others = df.filter(like="Signal").select_dtypes(None,np.number).columns
df[numeric] = df.groupby(["ID", "MODE"])[numeric].transform(pd.Series.interpolate, limit_direction="forward")
df[others] = df.groupby(["ID", "MODE"])[others].transform("ffill")
>>> df
ID MODE Signal1 Signal2 Signal3
0 0A active 13.0 NaN on
1 0A active 8.5 0.1 on
2 0A active 4.0 0.3 on
3 0A inactive 11.0 NaN off
4 0A inactive 11.0 4.5 off
5 1C active 22.0 NaN on
6 1C active 25.0 2.0 on
7 1C active 25.0 3.0 on
8 1C inactive 19.0 NaN NaN
>>> df.dropna()
ID MODE Signal1 Signal2 Signal3
1 0A active 8.5 0.1 on
2 0A active 4.0 0.3 on
4 0A inactive 11.0 4.5 off
6 1C active 25.0 2.0 on
7 1C active 25.0 3.0 on
Related Topics
Pandas Dataframe Check If Column Value Exists in a Group of Columns
Regex to Append Some Characters in a Certain Position
Converting Two Lists into a Matrix
Finding the Maximum Number of Columns in a File or CSV Using Python
Hiding Raw_Input() Password Input
How to Split an Integer into an Array of Digits
Python: Element Is Not Attached to the Page Document
Only Reading First N Rows of CSV File With CSV Reader in Python
Regex That Matches a Number With Commas for Every Three Digits
Python Pandas Count the Number of Occurances Inside Lists in a Column
How to Delete a Column That Contains Only Zeros in Pandas
How to Convert List into String With Quotes in Python
Replacing Pandas or Numpy Nan With a None to Use With Mysqldb
How to Write a Lambda Function That Is Conditional on Two Variables (Columns) in Python
Python: How to Keep Repeating a Program Until a Specific Input Is Obtained