Pandas New Column from Groupby Averages

Pandas new column from groupby averages

You need transform:

df['avg_result'] = df.groupby(['a', 'b'])['result'].transform('mean')

This generates a correctly indexed column of the groupby values for you:

   a   b  result  avg_result
0 1 10 100 100
1 1 20 200 250
2 1 20 300 250
3 2 10 400 400
4 2 20 500 550
5 2 20 600 550

group by in group by and average

If you want to first take mean on the combination of ['cluster', 'org'] and then take mean on cluster groups, you can use:

In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()
.groupby('cluster')['time'].mean())
Out[59]:
cluster
1 15
2 54
3 6
Name: time, dtype: int64

If you want the mean of cluster groups only, then you can use:

In [58]: df.groupby(['cluster']).mean()
Out[58]:
time
cluster
1 12.333333
2 54.000000
3 6.000000

You can also use groupby on ['cluster', 'org'] and then use mean():

In [57]: df.groupby(['cluster', 'org']).mean()
Out[57]:
time
cluster org
1 a 438886
c 23
2 d 9874
h 34
3 w 6

Mapping groupby mean statistics as a new column in pandas

I think you need transform:

df['new'] = df.groupby('Brand Origin')['2018'].transform('mean')

pandas - groupby a column, apply a function to create a new column - giving incorrect results

You can remove values:

df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)

Or:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.apply(pandas_rolling))

If possible without groupby.apply is necessary remove first level of MultiIndex:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.rolling(window=N)
.mean()
.droplevel(0))

How to make a new pandas column that's the average of the last 3 values?

This should work:

df['dayofweek'] = df['dt'].dt.dayofweek
df['output'] = df.apply(lambda x: df['sold'][(df.index < x.name) & (df.dayofweek == x.dayofweek)].tail(3).sum(), axis = 1)

Add GroupBy mean result as a new column in pandas

You could groupby and transform by mean.

df['value'] = df.groupby('indicator')['value'].transform('mean')

df
indicator value value type year
1 indicator 1 11.25 upper 2014
2 indicator 1 11.25 lower 2014
3 indicator 2 14.30 upper 2015
4 indicator 2 14.30 lower 2015

Or, if you want only one row per indicator, use agg.

df = df.groupby('indicator').agg('mean')
df
value year
indicator
indicator 1 11.25 2014
indicator 2 14.30 2015

If you want the index as a column instead, call reset_index:

df = df.reset_index()
df
indicator value year
0 indicator 1 11.25 2014
1 indicator 2 14.30 2015

Creating a new column based on the mean of other values in group

  1. Compute the means of all other values within each group using a double groupby:
  • sum all the values within the group
  • subtract the current (focal) value
  • divide by one less than the number of items in the group

  1. Assign the shift-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)

df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)

>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0

Group by columns under conditions to calculate average

Use DataFrame.pivot_table with helper column new by copy like ColB, then flatten MultiIndex and add ouput to new DataFrame created by aggregate sum:

df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2


Related Topics



Leave a reply



Submit