Calculate weighted average using a pandas/dataframe
I think I would do this with two groupbys.
First to calculate the "weighted average":
In [11]: g = df.groupby('Date')
In [12]: df.value / g.value.transform("sum") * df.wt
Out[12]:
0 0.125000
1 0.250000
2 0.416667
3 0.277778
4 0.444444
dtype: float64
If you set this as a column, you can groupby over it:
In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt
Now the sum of this column is the desired:
In [14]: g.wa.sum()
Out[14]:
Date
01/01/2012 0.791667
01/02/2012 0.722222
Name: wa, dtype: float64
or potentially:
In [15]: g.wa.transform("sum")
Out[15]:
0 0.791667
1 0.791667
2 0.791667
3 0.722222
4 0.722222
Name: wa, dtype: float64
Weighted average by another column in pandas
Let's do this:
f = lambda x: sum(x['#items'] * x['score']) / sum(x['#items'])
df.groupby('Group').apply(f)
Calculate weighted average of dataframe rows with missing values
Implementing the idea in my comment above. which is simpler than I thought because the DataFrame.sum
method seems to do fillna=0
automatically:
(df*w).sum(axis=1)/(~pd.isnull(df)*w).sum(axis=1)
will perform this operation in a vectorized way on all rows.
Pandas: calculate weighted average by row using a dataframe and a series
Just use numpy.average
, specifying weights
:
demand["result"]=np.average(demand, weights=months, axis=1)
https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.average.html
Outputs:
1 2 3 4 5 6 ... 8 9 10 11 12 result
0 360 500 64 50 40 30 ... 32 32 57 56 23 58.076923
1 40 180 30 40 24 34 ... 12 56 35 33 65 43.358974
2 100 450 60 30 45 65 ... 45 89 75 11 34 58.884615
3 20 60 10 60 34 80 ... 55 67 48 6 8 43.269231
4 55 50 0 50 60 78 ... 66 56 9 78 67 55.294872
Calculate weighted average with pandas dataframe
You may obtain within groups normalized weights by using transform
:
>>> df['weight'] = df['dist'] / df.groupby('ind')['dist'].transform('sum')
>>> df['weight']
0 0.357143
1 0.416667
2 0.250000
3 0.285714
4 0.583333
5 0.285714
6 0.714286
7 0.107143
Name: weight, dtype: float64
Then, you just need to multiply these weight by the values, and take the sum:
>>> df['wcas'], df['wdiff'] = (df[n] * df['weight'] for n in ('cas', 'diff'))
>>> df.groupby('ind')[['wcas', 'wdiff']].sum()
wcas wdiff
ind
g 6.714286 2.785714
la 3.107143 4.882143
p 3.750000 2.558333
Edit: with in-place mutation:
>>> backup = df.copy() # make a backup copy to mutate in place
>>> cols = df.columns[:2] # cas, diff
>>> df[cols] = df['weight'].values[:, None] * df[cols]
>>> df.groupby('ind')[cols].sum()
cas diff
ind
g 6.714286 2.785714
la 3.107143 4.882143
p 3.750000 2.558333
Calculating multiple weighted averages based on multiple weight values - Pandas
You might be looking for a generator object, something like this:
[np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]
will generate a list of elements where the first element corresponds to the average using 'weight1'
the second to 'weight2'
and so on. You can read it as a compressed for-loop, even though its quite a bit faster than using a for-loop and appending values to a list. df.columns
is just a list of the column names, so df.columns[1:]
is a list of column names omitting the first element.
So to get the output you're looking for just
avg = [np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]
avg_df = pd.DataFrame({'weights' : df.columns[1:], 'weightedvals' : avg})
Calculate the weighted average using groupby in Python
So this should do the trick I think
import pandas as pd
def calculator(df, columns):
weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum()
return weighted_sum
cols = ['tot_SKU', 'avg_lag']
Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols))
df.join(Sums.rename(('sums')), on='SF_type')
Edit: Added the requested merge with the old dataframe
python pandas weighted average with the use of groupby agg()
You can use x
you have in lambda (specifically, use it's .index
to get values you want). For example:
import pandas as pd
import numpy as np
def weighted_avg(group_df, whole_df, values, weights):
v = whole_df.loc[group_df.index, values]
w = whole_df.loc[group_df.index, weights]
return (v * w).sum() / w.sum()
dfr = pd.DataFrame(np.random.randint(1, 50, size=(4, 4)), columns=list("ABCD"))
dfr["group"] = [1, 1, 0, 1]
print(dfr)
dfr = (
dfr.groupby("group")
.agg(
{"A": "mean", "B": "sum", "C": lambda x: weighted_avg(x, dfr, "D", "C")}
)
.reset_index()
)
print(dfr)
Prints:
A B C D group
0 32 2 34 29 1
1 33 32 15 49 1
2 4 43 41 10 0
3 39 33 7 31 1
group A B C
0 0 4.000000 43 10.000000
1 1 34.666667 67 34.607143
EDIT: As @enke stated in comments, you can call your weighted_avg
function with already filtered dataframe:
weighted_avg(dfr.loc[x.index], 'D', 'C')
Calculate weighted average for multiple columns with NaN values grouped by index in Python
Simplified solution
x = df.drop(columns=['F', 'weight']) # x values
w = x.notna().mul(df['weight'], axis=0) # weights excluding nulls
wx = w * x # weights * x values
avg = wx.groupby(df['F']).sum() / w.groupby(df['F']).sum() # sum(w * x) / sum(w)
Explained
Drop the index and weight columns to get x values
# x
A B C D E
0 0.0 0.0 0.0 NaN NaN
1 0.0 1.0 1.0 0.0 0.0
2 0.0 NaN 1.0 0.0 1.0
3 1.0 0.0 1.0 0.0 1.0
4 NaN 1.0 0.0 0.0 0.0
Create a boolean mask using notna
then multiply by weights along axis 0
to project the weights values to each column
# w
A B C D E
0 7.754209 7.754209 7.754209 0.000000 0.000000
1 5.811653 5.811653 5.811653 5.811653 5.811653
2 7.858809 0.000000 7.858809 7.858809 7.858809
3 7.690689 7.690689 7.690689 7.690689 7.690689
4 0.000000 5.092012 5.092012 5.092012 5.092012
Multiple the x
values by weights w
# wx
A B C D E
0 0.000000 0.000000 0.000000 NaN NaN
1 0.000000 5.811653 5.811653 0.0 0.000000
2 0.000000 NaN 7.858809 0.0 7.858809
3 7.690689 0.000000 7.690689 0.0 7.690689
4 NaN 5.092012 0.000000 0.0 0.000000
Group the wx
and w
dataframe by index column F
and aggregate with sum
# wx.groupby(df['F']).sum()
A B C D E
F
XYZ 7.690689 10.903665 21.361151 0.0 15.549498
# w.groupby(df['F']).sum()
A B C D E
F
XYZ 29.11536 26.348563 34.207372 26.453163 26.453163
Divide the aggregated sums to calculate weighted average
# avg
A B C D E
F
XYZ 0.264145 0.413824 0.62446 0.0 0.587812
Related Topics
Best Way to Format Integer as String with Leading Zeros
Processing Single File from Multiple Processes
Removing Unicode \U2026 Like Characters in a String in Python2.7
Why Can a Python Dict Have Multiple Keys with the Same Hash
Accessing Object Memory Address
How to Check If an Ip Is in a Network in Python
How to Create Full Compressed Tar File Using Python
Extract a Page from a PDF as a Jpeg
How to Merge 200 CSV Files in Python
Using Moviepy, Scipy and Numpy in Amazon Lambda
Remove Non-Numeric Rows in One Column with Pandas
How to Check If Directory Exists in Python
Function with Varying Number of for Loops (Python)
How to Overwrite/Print Over the Current Line in Windows Command Line