Display weighted mean by group in the data.frame
If we use mutate
, then we can avoid the left_join
library(dplyr)
df %>%
group_by(education) %>%
mutate(weighted_income = weighted.mean(income, weight))
# obs income education weight weighted_income
# <int> <int> <fctr> <int> <dbl>
#1 1 1000 A 10 1166.667
#2 2 2000 B 1 1583.333
#3 3 1500 B 5 1583.333
#4 4 2000 A 2 1166.667
Grouping columns of dataframe by other dataframe and calculate weighted average of aggregated columns
- Multiply
df1
withweights
stack
andmap
the column names to the groupsgroupby
"Date" and group ("level_1") andsum
unstack
and format to the desired output
output = df1.set_index("Date").mul(weights.set_index("Date")).stack().reset_index(1)
output = (output.groupby([output.index,
output["level_1"].map(dict(zip(groups["ID"],groups["Group"])))])
.sum()
.unstack()
.droplevel(0,1)
.rename_axis(None, axis=1)
)
>>> output
Group1 Group2
Date
2021-01-01 1.850 4.2
2021-01-02 1.825 5.2
2021-01-03 6.225 2.5
2021-01-04 2.350 3.0
Weighted mean using aggregate across groups in r
Instead of mean
, use weighted.mean
. However, aggregate
, may not be an option here because aggregate
loop over only the 'Value' column and it doesn't have access to the 'Weight' for each group
library(dplyr)
DF %>%
group_by(Group_1, Group_2) %>%
summarise(wt_mean = weighted.mean(Value, Weight), .groups = 'drop')
-output
# A tibble: 21 x 3
# Groups: Group_1 [4]
# Group_1 Group_2 wt_mean
# <chr> <chr> <dbl>
# 1 a h 24.7
# 2 a i 15
# 3 a j 21.1
# 4 a k 23.6
# 5 a m 14.1
# 6 b i 40
# 7 b j 12.7
# 8 b k 6.88
# 9 b l 30.6
10 b m 5
# … with 11 more rows
If we want to use base R
, then by
should work
by(DF, DF[c('Group_1', 'Group_2')], function(x) weighted.mean(x$Value, x$Weight))
Weighted average grouping by date (in index) in pandas DataFrame
Try grouping every 2 hours and you will get closer -
d = {'date': ['2021-08-01 12:00:00', '2021-08-01 13:00:00', '2021-08-01 14:00:00', '2021-08-02 15:00:00', '2021-08-02 16:00:00', '2021-08-02 17:00:00'],
'mass': [23, 40, 10, 12, 15, 11],
'%': [0.4, 0.7, 0.9, 0.1, 0.2, 0.8]
}
df = pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df['mass_wt'] = df['mass'] * df['%']
op = df.groupby(pd.Grouper(freq='2H')).agg({'mass': 'sum', 'mass_wt': 'sum'}).query('mass > 0')
op['op'] = op['mass_wt'] / op['mass']
The weighted means of group is not equal to the total mean in pandas groupby
You compute a weighted mean within each group, so when you compute the total mean from the weighted means, the correct weight for each group is the sum of the weights within the group (and not the size of the group).
In [47]: wsums = df.groupby("groups").apply(lambda d: d["weights"].sum())
In [48]: total_mean_from_group_means = np.average(group_means, weights=wsums)
In [49]: total_mean_from_group_means
Out[49]: 0.5070955626929458
python pandas weighted average with the use of groupby agg()
You can use x
you have in lambda (specifically, use it's .index
to get values you want). For example:
import pandas as pd
import numpy as np
def weighted_avg(group_df, whole_df, values, weights):
v = whole_df.loc[group_df.index, values]
w = whole_df.loc[group_df.index, weights]
return (v * w).sum() / w.sum()
dfr = pd.DataFrame(np.random.randint(1, 50, size=(4, 4)), columns=list("ABCD"))
dfr["group"] = [1, 1, 0, 1]
print(dfr)
dfr = (
dfr.groupby("group")
.agg(
{"A": "mean", "B": "sum", "C": lambda x: weighted_avg(x, dfr, "D", "C")}
)
.reset_index()
)
print(dfr)
Prints:
A B C D group
0 32 2 34 29 1
1 33 32 15 49 1
2 4 43 41 10 0
3 39 33 7 31 1
group A B C
0 0 4.000000 43 10.000000
1 1 34.666667 67 34.607143
EDIT: As @enke stated in comments, you can call your weighted_avg
function with already filtered dataframe:
weighted_avg(dfr.loc[x.index], 'D', 'C')
Calculate the weighted average using groupby in Python
So this should do the trick I think
import pandas as pd
def calculator(df, columns):
weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum()
return weighted_sum
cols = ['tot_SKU', 'avg_lag']
Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols))
df.join(Sums.rename(('sums')), on='SF_type')
Edit: Added the requested merge with the old dataframe
Related Topics
Create Frequency Tables for Multiple Factor Columns in R
Techniques for Finding Near Duplicate Records
How to Pass Parameters to a Shiny App via Url
Subset Based on Variable Column Name
In R, Use Gsub to Remove All Punctuation Except Period
How to Use Functions in One R Package Masked by Another Package
Global Variables in Packages in R
Align Multiple Tables Side by Side
Show Names of Everything in a Package
How to Show a Legend on Dual Y-Axis Ggplot
Filling Area Under Curve Based on Value
Plot Data in Descending Order as Appears in Data Frame
Displaying a PDF from a Local Drive in Shiny
Shiny App: Downloadhandler Does Not Produce a File
Command Lines Error in Rstudio Console