Display Weighted Mean by Group in the Data.Frame

Display weighted mean by group in the data.frame

If we use mutate, then we can avoid the left_join

library(dplyr)
df %>%
group_by(education) %>%
mutate(weighted_income = weighted.mean(income, weight))
# obs income education weight weighted_income
# <int> <int> <fctr> <int> <dbl>
#1 1 1000 A 10 1166.667
#2 2 2000 B 1 1583.333
#3 3 1500 B 5 1583.333
#4 4 2000 A 2 1166.667

Grouping columns of dataframe by other dataframe and calculate weighted average of aggregated columns


  1. Multiply df1 with weights
  2. stack and map the column names to the groups
  3. groupby "Date" and group ("level_1") and sum
  4. unstack and format to the desired output
output = df1.set_index("Date").mul(weights.set_index("Date")).stack().reset_index(1)
output = (output.groupby([output.index,
output["level_1"].map(dict(zip(groups["ID"],groups["Group"])))])
.sum()
.unstack()
.droplevel(0,1)
.rename_axis(None, axis=1)
)

>>> output
Group1 Group2
Date
2021-01-01 1.850 4.2
2021-01-02 1.825 5.2
2021-01-03 6.225 2.5
2021-01-04 2.350 3.0

Weighted mean using aggregate across groups in r

Instead of mean, use weighted.mean. However, aggregate, may not be an option here because aggregate loop over only the 'Value' column and it doesn't have access to the 'Weight' for each group

library(dplyr)
DF %>%
group_by(Group_1, Group_2) %>%
summarise(wt_mean = weighted.mean(Value, Weight), .groups = 'drop')

-output

# A tibble: 21 x 3
# Groups: Group_1 [4]
# Group_1 Group_2 wt_mean
# <chr> <chr> <dbl>
# 1 a h 24.7
# 2 a i 15
# 3 a j 21.1
# 4 a k 23.6
# 5 a m 14.1
# 6 b i 40
# 7 b j 12.7
# 8 b k 6.88
# 9 b l 30.6
10 b m 5
# … with 11 more rows

If we want to use base R, then by should work

by(DF, DF[c('Group_1', 'Group_2')], function(x) weighted.mean(x$Value, x$Weight))

Weighted average grouping by date (in index) in pandas DataFrame

Try grouping every 2 hours and you will get closer -

d = {'date': ['2021-08-01 12:00:00', '2021-08-01 13:00:00', '2021-08-01 14:00:00', '2021-08-02 15:00:00', '2021-08-02 16:00:00', '2021-08-02 17:00:00'], 
'mass': [23, 40, 10, 12, 15, 11],
'%': [0.4, 0.7, 0.9, 0.1, 0.2, 0.8]
}
df = pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

df['mass_wt'] = df['mass'] * df['%']
op = df.groupby(pd.Grouper(freq='2H')).agg({'mass': 'sum', 'mass_wt': 'sum'}).query('mass > 0')
op['op'] = op['mass_wt'] / op['mass']

The weighted means of group is not equal to the total mean in pandas groupby

You compute a weighted mean within each group, so when you compute the total mean from the weighted means, the correct weight for each group is the sum of the weights within the group (and not the size of the group).

In [47]: wsums = df.groupby("groups").apply(lambda d: d["weights"].sum())

In [48]: total_mean_from_group_means = np.average(group_means, weights=wsums)

In [49]: total_mean_from_group_means
Out[49]: 0.5070955626929458

python pandas weighted average with the use of groupby agg()

You can use x you have in lambda (specifically, use it's .index to get values you want). For example:

import pandas as pd
import numpy as np


def weighted_avg(group_df, whole_df, values, weights):
v = whole_df.loc[group_df.index, values]
w = whole_df.loc[group_df.index, weights]
return (v * w).sum() / w.sum()


dfr = pd.DataFrame(np.random.randint(1, 50, size=(4, 4)), columns=list("ABCD"))
dfr["group"] = [1, 1, 0, 1]

print(dfr)
dfr = (
dfr.groupby("group")
.agg(
{"A": "mean", "B": "sum", "C": lambda x: weighted_avg(x, dfr, "D", "C")}
)
.reset_index()
)
print(dfr)

Prints:

    A   B   C   D  group
0 32 2 34 29 1
1 33 32 15 49 1
2 4 43 41 10 0
3 39 33 7 31 1

group A B C
0 0 4.000000 43 10.000000
1 1 34.666667 67 34.607143

EDIT: As @enke stated in comments, you can call your weighted_avg function with already filtered dataframe:

weighted_avg(dfr.loc[x.index], 'D', 'C')

Calculate the weighted average using groupby in Python

So this should do the trick I think

import pandas as pd


def calculator(df, columns):
weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum()
return weighted_sum

cols = ['tot_SKU', 'avg_lag']

Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols))
df.join(Sums.rename(('sums')), on='SF_type')

Edit: Added the requested merge with the old dataframe



Related Topics



Leave a reply



Submit