Pandas new column from groupby averages
You need transform
:
df['avg_result'] = df.groupby(['a', 'b'])['result'].transform('mean')
This generates a correctly indexed column of the groupby values for you:
a b result avg_result
0 1 10 100 100
1 1 20 200 250
2 1 20 300 250
3 2 10 400 400
4 2 20 500 550
5 2 20 600 550
group by in group by and average
If you want to first take mean on the combination of ['cluster', 'org']
and then take mean on cluster
groups, you can use:
In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()
.groupby('cluster')['time'].mean())
Out[59]:
cluster
1 15
2 54
3 6
Name: time, dtype: int64
If you want the mean of cluster
groups only, then you can use:
In [58]: df.groupby(['cluster']).mean()
Out[58]:
time
cluster
1 12.333333
2 54.000000
3 6.000000
You can also use groupby
on ['cluster', 'org']
and then use mean()
:
In [57]: df.groupby(['cluster', 'org']).mean()
Out[57]:
time
cluster org
1 a 438886
c 23
2 d 9874
h 34
3 w 6
Mapping groupby mean statistics as a new column in pandas
I think you need transform
:
df['new'] = df.groupby('Brand Origin')['2018'].transform('mean')
pandas - groupby a column, apply a function to create a new column - giving incorrect results
You can remove values
:
df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)
Or:
df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.apply(pandas_rolling))
If possible without groupby.apply
is necessary remove first level of MultiIndex
:
df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.rolling(window=N)
.mean()
.droplevel(0))
How to make a new pandas column that's the average of the last 3 values?
This should work:
df['dayofweek'] = df['dt'].dt.dayofweek
df['output'] = df.apply(lambda x: df['sold'][(df.index < x.name) & (df.dayofweek == x.dayofweek)].tail(3).sum(), axis = 1)
Add GroupBy mean result as a new column in pandas
You could groupby
and transform
by mean
.
df['value'] = df.groupby('indicator')['value'].transform('mean')
df
indicator value value type year
1 indicator 1 11.25 upper 2014
2 indicator 1 11.25 lower 2014
3 indicator 2 14.30 upper 2015
4 indicator 2 14.30 lower 2015
Or, if you want only one row per indicator, use agg
.
df = df.groupby('indicator').agg('mean')
df
value year
indicator
indicator 1 11.25 2014
indicator 2 14.30 2015
If you want the index as a column instead, call reset_index
:
df = df.reset_index()
df
indicator value year
0 indicator 1 11.25 2014
1 indicator 2 14.30 2015
Creating a new column based on the mean of other values in group
- Compute the means of all other values within each group using a double
groupby
:
sum
all the values within the group- subtract the current (focal) value
- divide by one less than the number of items in the group
- Assign the
shift
-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)
df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)
>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0
Group by columns under conditions to calculate average
Use DataFrame.pivot_table
with helper column new
by copy like ColB
, then flatten MultiIndex
and add ouput to new DataFrame created by aggregate sum
:
df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2
Related Topics
Purpose of Calling Function Without Brackets Python
How to Use Tailwindcss with Django
Error When Installing Rpy2 Module in Python with Easy_Install
How Can One Find the Unicode Codepoints That a Font Has Glyphs For, on a Debian-Based System
How to Import a Python Class That Is in a Directory Above
What Is the Performance Impact of Non-Unique Indexes in Pandas
The Difference Between Sys.Stdout.Write and Print
Pandas: Peculiar Performance Drop for Inplace Rename After Dropna
How to Check If Any Value Is Nan in a Pandas Dataframe
Multiple Level Template Inheritance in Jinja2
Install Rpy2 on Windows7 64Bit for Python 2.7
Python Equivalent of Ruby's .Select
How to Add Title to Subplots in Matplotlib
Possible to Share In-Memory Data Between 2 Separate Processes