Adding a column of means by group to original data
This is what the ave
function is for.
df1$Y.New <- ave(df1$Y, df1$X)
Create new column for mean by group in original dataframe in R
We can use mutate
instead of summarise
library(dplyr)
df <- df %>%
group_by(unit_id) %>%
mutate(mean = mean(outcome))
Creating a new column based on the mean of other values in group
- Compute the means of all other values within each group using a double
groupby
:
sum
all the values within the group- subtract the current (focal) value
- divide by one less than the number of items in the group
- Assign the
shift
-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)
df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)
>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0
Add a column with mean values for groups based on another column
Can use groupby transform
to calculate the mean
on the desired columns then join
back to the initial DataFrame to add the newly created columns:
df = df.join(
df.groupby('area')[['prod_a', 'prod_b']]
.transform('mean') # Calculate the mean for each group
.rename(columns='mean {} for the area'.format) # Rename columns
)
df
:
entity | area | prod_a | prod_b | mean prod_a for the area | mean prod_b for the area |
---|---|---|---|---|---|
001 | A | 1 | 3 | 1.5 | 4.5 |
002 | B | 2 | 4 | 4 | 4.5 |
003 | A | 2 | 6 | 1.5 | 4.5 |
004 | C | 7 | 2 | 5.5 | 5 |
005 | C | 4 | 8 | 5.5 | 5 |
006 | B | 6 | 5 | 4 | 4.5 |
Dataframe: adding a column with mean by other column group
Another alternative with pd.eval
and transform
with mean
data['av_state'] = (data.assign(state=pd.eval(data['state']).astype(int))
.groupby("group")['state'].transform('mean'))
print(data)
id group state value av_state
0 1 1 True 11 0.666667
1 2 1 False 12 0.666667
2 3 2 False 5 0.500000
3 4 1 True 8 0.666667
4 5 2 True 3 0.500000
Add a column to the original pandas data frame after grouping by 2 columns and taking dot product of two other columns
One way using pandas.DataFrame.prod
:
df["Avg Price"] = df[["Weights", "Price"]].prod(1)
df["Avg Price"] = df.groupby(["Date", "Issuer"])["Avg Price"].transform("sum")
print(df)
Output:
Date Issuer Weights Price Avg Price
0 2019-11-12 A 0.4 100 120.0
1 2019-15-12 B 0.5 100 100.0
2 2019-11-12 A 0.2 200 120.0
3 2019-15-12 B 0.3 100 100.0
4 2019-11-12 A 0.4 100 120.0
5 2019-15-12 B 0.2 100 100.0
Aggregate by group AND add column to data frame in R
Since you have a tibble, first a dplyr solution. Next a base R version.
using dplyr:
df1 %>%
group_by(place) %>%
mutate(sum_num = sum(number))
# A tibble: 11 x 4
# Groups: place [4]
place animal number sum_num
<chr> <chr> <dbl> <dbl>
1 a cat 5 11
2 a bear 6 11
3 b cat 7 22
4 b bear 4 22
5 b pig 5 22
6 b goat 6 22
7 c cat 8 16
8 c bear 5 16
9 c goat 3 16
10 d goat 7 11
11 d bear 4 11
using base R:
df1$sum_num <- ave(df1$number, df1$place, FUN = sum)
# A tibble: 11 x 4
place animal number sum_num
<chr> <chr> <dbl> <dbl>
1 a cat 5 11
2 a bear 6 11
3 b cat 7 22
4 b bear 4 22
5 b pig 5 22
6 b goat 6 22
7 c cat 8 16
8 c bear 5 16
9 c goat 3 16
10 d goat 7 11
11 d bear 4 11
Related Topics
How Does the 'Prop.Table()' Function Work in R
Saving Output of Confusionmatrix as a .Csv Table
Find All Combinations of a Set of Numbers That Add Up to a Certain Total
R: Error in Usemethod("Tbl_Vars")
Extract Rows for the First Occurrence of a Variable in a Data Frame
Sum Across Multiple Columns With Dplyr
How to Remove the Negative Values from a Data Frame in R
Creating a for Loop to Subset Data on R
Break Dataframe into Smaller Dataframe'S and Save Them
How to Replace Negative Values in a Dataframe Column With a Different Value
Conditionally Remove Rows from a Database Using R
R: How to Check If All Columns in a Data.Frame Are the Same
How to Select Variables in an R Dataframe Whose Names Contain a Particular String
How to Add a Diagonal Line to a Plot