Adding a Column of Means by Group to Original Data

Adding a column of means by group to original data

This is what the ave function is for.

df1$Y.New <- ave(df1$Y, df1$X)

Create new column for mean by group in original dataframe in R

We can use mutate instead of summarise

library(dplyr)
df <- df %>%
group_by(unit_id) %>%
mutate(mean = mean(outcome))

Creating a new column based on the mean of other values in group

  1. Compute the means of all other values within each group using a double groupby:
  • sum all the values within the group
  • subtract the current (focal) value
  • divide by one less than the number of items in the group

  1. Assign the shift-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)

df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)

>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0

Add a column with mean values for groups based on another column

Can use groupby transform to calculate the mean on the desired columns then join back to the initial DataFrame to add the newly created columns:

df = df.join(
df.groupby('area')[['prod_a', 'prod_b']]
.transform('mean') # Calculate the mean for each group
.rename(columns='mean {} for the area'.format) # Rename columns
)

df:































































entityareaprod_aprod_bmean prod_a for the areamean prod_b for the area
001A131.54.5
002B2444.5
003A261.54.5
004C725.55
005C485.55
006B6544.5

Dataframe: adding a column with mean by other column group

Another alternative with pd.eval and transform with mean

data['av_state'] = (data.assign(state=pd.eval(data['state']).astype(int))
.groupby("group")['state'].transform('mean'))


print(data)

id group state value av_state
0 1 1 True 11 0.666667
1 2 1 False 12 0.666667
2 3 2 False 5 0.500000
3 4 1 True 8 0.666667
4 5 2 True 3 0.500000

Add a column to the original pandas data frame after grouping by 2 columns and taking dot product of two other columns

One way using pandas.DataFrame.prod:

df["Avg Price"] = df[["Weights", "Price"]].prod(1)
df["Avg Price"] = df.groupby(["Date", "Issuer"])["Avg Price"].transform("sum")
print(df)

Output:

         Date Issuer  Weights  Price  Avg Price
0 2019-11-12 A 0.4 100 120.0
1 2019-15-12 B 0.5 100 100.0
2 2019-11-12 A 0.2 200 120.0
3 2019-15-12 B 0.3 100 100.0
4 2019-11-12 A 0.4 100 120.0
5 2019-15-12 B 0.2 100 100.0

Aggregate by group AND add column to data frame in R

Since you have a tibble, first a dplyr solution. Next a base R version.

using dplyr:

df1 %>% 
group_by(place) %>%
mutate(sum_num = sum(number))

# A tibble: 11 x 4
# Groups: place [4]
place animal number sum_num
<chr> <chr> <dbl> <dbl>
1 a cat 5 11
2 a bear 6 11
3 b cat 7 22
4 b bear 4 22
5 b pig 5 22
6 b goat 6 22
7 c cat 8 16
8 c bear 5 16
9 c goat 3 16
10 d goat 7 11
11 d bear 4 11

using base R:

df1$sum_num <- ave(df1$number, df1$place, FUN = sum)

# A tibble: 11 x 4
place animal number sum_num
<chr> <chr> <dbl> <dbl>
1 a cat 5 11
2 a bear 6 11
3 b cat 7 22
4 b bear 4 22
5 b pig 5 22
6 b goat 6 22
7 c cat 8 16
8 c bear 5 16
9 c goat 3 16
10 d goat 7 11
11 d bear 4 11


Related Topics



Leave a reply



Submit