Pandas Groupby Columns With Nan (Missing) Values

Pandas Groupby with Categorical Columns returns NaN

There are all possible combinations of categories, unused categories create missing values, check this.

So if need remove mising values:

print(df.groupby(["a_bin", "b_bin"]).c.mean().dropna())
a_bin          b_bin           
(0.0, 0.0101]  (0.0, 0.0101]       0.381681
               (0.0505, 0.0606]    0.148762
               (0.0909, 0.101]     0.313093
               (0.101, 0.111]      0.488104
               (0.313, 0.323]      0.518599

(0.99, 1.0]    (0.505, 0.515]      0.149027
               (0.576, 0.586]      0.099652
               (0.778, 0.788]      0.220360
               (0.828, 0.838]      0.166424
               (0.97, 0.98]        0.516558
Name: c, Length: 948, dtype: float64

How to impute missing values with groupby if the group has less than 3 nan

Try slightly modify your solution

df_daily_grouped['price'].transform(lambda x : x.fillna(x.mean()) if x.isnull().sum()<3 else x)

Pandas Grouping by Id and getting non-NaN values

This should do what you what:

df.groupby('salesforce_id').first().reset_index(drop=True)

That will merge all the columns into one, keeping only the non-NaN value for each run (unless there are no non-NaN values in all the columns for that row; then the value in the final merged column will be NaN).

Take min and max with null values - pandas groupby

IIUC,DataFrame.mask to set NaN where there are any nan for each group and col

new_df = \
df.groupby('id')\
  .agg({'start':'min', 'end':'max'})\
  .mask(df[['start', 'end']].isna()
                            .groupby(df['id'])
                            .max())\
  .reset_index()

print(new_df)
  id               start        end
0  a 2020-01-01 00:00:00 2020-01-02
1  b 2020-01-01 18:37:00        NaT
2  c 2020-02-04 00:00:00 2020-07-13
3  d 2020-04-19 20:45:00 2021-03-02

Detail:

print(df[['start', 'end']].isna()
                            .groupby(df['id'])
                            .max())

    start    end
id              
a   False  False
b   False   True
c   False  False
d   False  False

In the case of multiple columns to group by:

new_df = \
df.groupby(['id', 'status'])\
  .agg({'start':'min', 'end':'max'})\
  .mask(df[['start', 'end']].isna()
                            .groupby([df['id'], df['status']])
                            .max())\
  .reset_index()

Pandas groupby NaN/None values in non-key columns

last is designed to get the last non-NA value, independently in each column.

What you want (last row per group) is tail:

df.groupby(by='a', as_index=False).tail(1)

Output:

   a     b     c
2  1   NaN     z
3  2  12.0  None

How to groupby df according to two column values and handling missing values in pandas?

IIUC, you want:

groupby the ID and MODE columns and interpolate all numeric columns
groupby the ID and MODE columns and ffill all non-numeric columns

import numpy as np

#replace string "NaN" with numpy.nan
df = df.replace("NaN", np.nan)

numeric = df.filter(like="Signal").select_dtypes(np.number).columns
others = df.filter(like="Signal").select_dtypes(None,np.number).columns

df[numeric] = df.groupby(["ID", "MODE"])[numeric].transform(pd.Series.interpolate, limit_direction="forward")
df[others] = df.groupby(["ID", "MODE"])[others].transform("ffill")

>>> df
   ID      MODE  Signal1  Signal2 Signal3
0  0A    active     13.0      NaN      on
1  0A    active      8.5      0.1      on
2  0A    active      4.0      0.3      on
3  0A  inactive     11.0      NaN     off
4  0A  inactive     11.0      4.5     off
5  1C    active     22.0      NaN      on
6  1C    active     25.0      2.0      on
7  1C    active     25.0      3.0      on
8  1C  inactive     19.0      NaN     NaN

>>> df.dropna()
   ID      MODE  Signal1  Signal2 Signal3
1  0A    active      8.5      0.1      on
2  0A    active      4.0      0.3      on
4  0A  inactive     11.0      4.5     off
6  1C    active     25.0      2.0      on
7  1C    active     25.0      3.0      on