Pandas New Column from Groupby Averages

Pandas new column from groupby averages

You need transform:

df['avg_result'] = df.groupby(['a', 'b'])['result'].transform('mean')

This generates a correctly indexed column of the groupby values for you:

   a   b  result  avg_result
0  1  10     100         100
1  1  20     200         250
2  1  20     300         250
3  2  10     400         400
4  2  20     500         550
5  2  20     600         550

group by in group by and average

If you want to first take mean on the combination of ['cluster', 'org'] and then take mean on cluster groups, you can use:

In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()
            .groupby('cluster')['time'].mean())
Out[59]:
cluster
1          15
2          54
3           6
Name: time, dtype: int64

If you want the mean of cluster groups only, then you can use:

In [58]: df.groupby(['cluster']).mean()
Out[58]:
              time
cluster
1        12.333333
2        54.000000
3         6.000000

You can also use groupby on ['cluster', 'org'] and then use mean():

In [57]: df.groupby(['cluster', 'org']).mean()
Out[57]:
               time
cluster org
1       a    438886
        c        23
2       d      9874
        h        34
3       w         6

Mapping groupby mean statistics as a new column in pandas

I think you need transform:

df['new'] = df.groupby('Brand Origin')['2018'].transform('mean')

pandas - groupby a column, apply a function to create a new column - giving incorrect results

You can remove values:

df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)

Or:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .apply(pandas_rolling))

If possible without groupby.apply is necessary remove first level of MultiIndex:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .rolling(window=N)
                                               .mean()
                                               .droplevel(0))

How to make a new pandas column that's the average of the last 3 values?

This should work:

df['dayofweek'] = df['dt'].dt.dayofweek
df['output'] = df.apply(lambda x: df['sold'][(df.index < x.name) & (df.dayofweek == x.dayofweek)].tail(3).sum(), axis = 1)

Add GroupBy mean result as a new column in pandas

You could groupby and transform by mean.

df['value'] = df.groupby('indicator')['value'].transform('mean')

df
     indicator  value value type  year
1  indicator 1  11.25      upper  2014
2  indicator 1  11.25      lower  2014
3  indicator 2  14.30      upper  2015
4  indicator 2  14.30      lower  2015

Or, if you want only one row per indicator, use agg.

df = df.groupby('indicator').agg('mean')
df
             value  year
indicator               
indicator 1  11.25  2014
indicator 2  14.30  2015

If you want the index as a column instead, call reset_index:

df = df.reset_index()
df
     indicator  value  year
0  indicator 1  11.25  2014
1  indicator 2  14.30  2015

Creating a new column based on the mean of other values in group

Compute the means of all other values within each group using a double groupby:

sum all the values within the group
subtract the current (focal) value
divide by one less than the number of items in the group

Assign the shift-ed means to a new column:

means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)

df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)

>>> df
   col1  col2  col3  group  mean
0     A  2015    10     10   0.0
1     A  2016    20     10   9.0
2     A  2017    25     10  10.5
3     B  2015    10     10   0.0
4     B  2016    12     10   9.0
5     B  2017    14     10  14.5
6     c  2015     8     10   0.0
7     c  2016     9     10  10.0
8     c  2017    10     10  16.0
9     d  2015    50     20   0.0
10    d  2016    60     20  40.0
11    d  2017    70     20  50.0
12    e  2015    40     20   0.0
13    e  2016    50     20  50.0
14    e  2017    60     20  60.0

Group by columns under conditions to calculate average

Use DataFrame.pivot_table with helper column new by copy like ColB, then flatten MultiIndex and add ouput to new DataFrame created by aggregate sum:

df1 = (df.assign(new=df['ColB'])
         .pivot_table(index=['ColA', 'ColB'], 
                      columns='new', 
                      values=['interval','duration'], 
                      fill_value=0,
                      aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
        .sum()
        .to_frame(name='SumCounter')
        .join(df1).reset_index())
print (df)
  ColA ColB  SumCounter  durationSD  durationUD  intervalSD  intervalUD
0    A   SD           3         2.5         0.0         3.5           0
1    A   UD          10         0.0         2.0         0.0           1
2    B   SD          32         2.0         0.0         3.5           0
3    B   UD           4         0.0         1.5         0.0           2

Pandas New Column from Groupby Averages