Calculate Weighted Average Using a Pandas/Dataframe

Calculate weighted average using a pandas/dataframe

I think I would do this with two groupbys.

First to calculate the "weighted average":

In [11]: g = df.groupby('Date')

In [12]: df.value / g.value.transform("sum") * df.wt
Out[12]:
0 0.125000
1 0.250000
2 0.416667
3 0.277778
4 0.444444
dtype: float64

If you set this as a column, you can groupby over it:

In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt

Now the sum of this column is the desired:

In [14]: g.wa.sum()
Out[14]:
Date
01/01/2012 0.791667
01/02/2012 0.722222
Name: wa, dtype: float64

or potentially:

In [15]: g.wa.transform("sum")
Out[15]:
0 0.791667
1 0.791667
2 0.791667
3 0.722222
4 0.722222
Name: wa, dtype: float64

Weighted average by another column in pandas

Let's do this:

f = lambda x: sum(x['#items'] * x['score']) / sum(x['#items'])

df.groupby('Group').apply(f)

Calculate weighted average of dataframe rows with missing values

Implementing the idea in my comment above. which is simpler than I thought because the DataFrame.sum method seems to do fillna=0 automatically:

(df*w).sum(axis=1)/(~pd.isnull(df)*w).sum(axis=1)

will perform this operation in a vectorized way on all rows.

Pandas: calculate weighted average by row using a dataframe and a series

Just use numpy.average, specifying weights:

demand["result"]=np.average(demand, weights=months, axis=1)

https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.average.html

Outputs:

     1    2   3   4   5   6  ...   8   9  10  11  12     result
0 360 500 64 50 40 30 ... 32 32 57 56 23 58.076923
1 40 180 30 40 24 34 ... 12 56 35 33 65 43.358974
2 100 450 60 30 45 65 ... 45 89 75 11 34 58.884615
3 20 60 10 60 34 80 ... 55 67 48 6 8 43.269231
4 55 50 0 50 60 78 ... 66 56 9 78 67 55.294872

Calculate weighted average with pandas dataframe

You may obtain within groups normalized weights by using transform:

>>> df['weight'] = df['dist'] / df.groupby('ind')['dist'].transform('sum')
>>> df['weight']
0 0.357143
1 0.416667
2 0.250000
3 0.285714
4 0.583333
5 0.285714
6 0.714286
7 0.107143
Name: weight, dtype: float64

Then, you just need to multiply these weight by the values, and take the sum:

>>> df['wcas'], df['wdiff'] = (df[n] * df['weight'] for n in ('cas', 'diff'))
>>> df.groupby('ind')[['wcas', 'wdiff']].sum()
wcas wdiff
ind
g 6.714286 2.785714
la 3.107143 4.882143
p 3.750000 2.558333

Edit: with in-place mutation:

>>> backup = df.copy()     # make a backup copy to mutate in place
>>> cols = df.columns[:2] # cas, diff
>>> df[cols] = df['weight'].values[:, None] * df[cols]
>>> df.groupby('ind')[cols].sum()
cas diff
ind
g 6.714286 2.785714
la 3.107143 4.882143
p 3.750000 2.558333

Calculating multiple weighted averages based on multiple weight values - Pandas

You might be looking for a generator object, something like this:

[np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]

will generate a list of elements where the first element corresponds to the average using 'weight1' the second to 'weight2' and so on. You can read it as a compressed for-loop, even though its quite a bit faster than using a for-loop and appending values to a list. df.columns is just a list of the column names, so df.columns[1:] is a list of column names omitting the first element.

So to get the output you're looking for just

avg = [np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]
avg_df = pd.DataFrame({'weights' : df.columns[1:], 'weightedvals' : avg})

Calculate the weighted average using groupby in Python

So this should do the trick I think

import pandas as pd

def calculator(df, columns):
weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum()
return weighted_sum

cols = ['tot_SKU', 'avg_lag']

Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols))
df.join(Sums.rename(('sums')), on='SF_type')

Edit: Added the requested merge with the old dataframe

python pandas weighted average with the use of groupby agg()

You can use x you have in lambda (specifically, use it's .index to get values you want). For example:

import pandas as pd
import numpy as np

def weighted_avg(group_df, whole_df, values, weights):
v = whole_df.loc[group_df.index, values]
w = whole_df.loc[group_df.index, weights]
return (v * w).sum() / w.sum()

dfr = pd.DataFrame(np.random.randint(1, 50, size=(4, 4)), columns=list("ABCD"))
dfr["group"] = [1, 1, 0, 1]

print(dfr)
dfr = (
dfr.groupby("group")
.agg(
{"A": "mean", "B": "sum", "C": lambda x: weighted_avg(x, dfr, "D", "C")}
)
.reset_index()
)
print(dfr)

Prints:

    A   B   C   D  group
0 32 2 34 29 1
1 33 32 15 49 1
2 4 43 41 10 0
3 39 33 7 31 1

group A B C
0 0 4.000000 43 10.000000
1 1 34.666667 67 34.607143

EDIT: As @enke stated in comments, you can call your weighted_avg function with already filtered dataframe:

weighted_avg(dfr.loc[x.index], 'D', 'C')

Calculate weighted average for multiple columns with NaN values grouped by index in Python

Simplified solution

x = df.drop(columns=['F', 'weight']) # x values
w = x.notna().mul(df['weight'], axis=0) # weights excluding nulls

wx = w * x # weights * x values
avg = wx.groupby(df['F']).sum() / w.groupby(df['F']).sum() # sum(w * x) / sum(w)

Explained

Drop the index and weight columns to get x values

# x
A B C D E
0 0.0 0.0 0.0 NaN NaN
1 0.0 1.0 1.0 0.0 0.0
2 0.0 NaN 1.0 0.0 1.0
3 1.0 0.0 1.0 0.0 1.0
4 NaN 1.0 0.0 0.0 0.0

Create a boolean mask using notna then multiply by weights along axis 0 to project the weights values to each column

# w
A B C D E
0 7.754209 7.754209 7.754209 0.000000 0.000000
1 5.811653 5.811653 5.811653 5.811653 5.811653
2 7.858809 0.000000 7.858809 7.858809 7.858809
3 7.690689 7.690689 7.690689 7.690689 7.690689
4 0.000000 5.092012 5.092012 5.092012 5.092012

Multiple the x values by weights w

# wx
A B C D E
0 0.000000 0.000000 0.000000 NaN NaN
1 0.000000 5.811653 5.811653 0.0 0.000000
2 0.000000 NaN 7.858809 0.0 7.858809
3 7.690689 0.000000 7.690689 0.0 7.690689
4 NaN 5.092012 0.000000 0.0 0.000000

Group the wx and w dataframe by index column F and aggregate with sum

# wx.groupby(df['F']).sum()
A B C D E
F
XYZ 7.690689 10.903665 21.361151 0.0 15.549498

# w.groupby(df['F']).sum()
A B C D E
F
XYZ 29.11536 26.348563 34.207372 26.453163 26.453163

Divide the aggregated sums to calculate weighted average

# avg
A B C D E
F
XYZ 0.264145 0.413824 0.62446 0.0 0.587812


Related Topics



Leave a reply



Submit