﻿ Adding a Column of Means by Group to Original Data - ITCodar

# Adding a Column of Means by Group to Original Data

## Adding a column of means by group to original data

This is what the `ave` function is for.

``df1\$Y.New <- ave(df1\$Y, df1\$X)``

## Create new column for mean by group in original dataframe in R

We can use `mutate` instead of `summarise`

``library(dplyr)df <- df %>%        group_by(unit_id) %>%        mutate(mean = mean(outcome))``

## Creating a new column based on the mean of other values in group

1. Compute the means of all other values within each group using a double `groupby`:
• `sum` all the values within the group
• subtract the current (focal) value
• divide by one less than the number of items in the group

1. Assign the `shift`-ed means to a new column:
``means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)>>> df   col1  col2  col3  group  mean0     A  2015    10     10   0.01     A  2016    20     10   9.02     A  2017    25     10  10.53     B  2015    10     10   0.04     B  2016    12     10   9.05     B  2017    14     10  14.56     c  2015     8     10   0.07     c  2016     9     10  10.08     c  2017    10     10  16.09     d  2015    50     20   0.010    d  2016    60     20  40.011    d  2017    70     20  50.012    e  2015    40     20   0.013    e  2016    50     20  50.014    e  2017    60     20  60.0``

## Add a column with mean values for groups based on another column

Can use `groupby transform` to calculate the `mean` on the desired columns then `join` back to the initial DataFrame to add the newly created columns:

``df = df.join(    df.groupby('area')[['prod_a', 'prod_b']]        .transform('mean')  # Calculate the mean for each group        .rename(columns='mean {} for the area'.format)  # Rename columns )``

`df`:

entityareaprod_aprod_bmean prod_a for the areamean prod_b for the area
001A131.54.5
002B2444.5
003A261.54.5
004C725.55
005C485.55
006B6544.5

## Dataframe: adding a column with mean by other column group

Another alternative with `pd.eval` and `transform` with `mean`

``data['av_state'] = (data.assign(state=pd.eval(data['state']).astype(int))                       .groupby("group")['state'].transform('mean'))``

``print(data)  id group  state  value  av_state0  1     1   True     11  0.6666671  2     1  False     12  0.6666672  3     2  False      5  0.5000003  4     1   True      8  0.6666674  5     2   True      3  0.500000``

## Add a column to the original pandas data frame after grouping by 2 columns and taking dot product of two other columns

One way using `pandas.DataFrame.prod`:

``df["Avg Price"] = df[["Weights", "Price"]].prod(1)df["Avg Price"] = df.groupby(["Date", "Issuer"])["Avg Price"].transform("sum")print(df)``

Output:

``         Date Issuer  Weights  Price  Avg Price0  2019-11-12      A      0.4    100      120.01  2019-15-12      B      0.5    100      100.02  2019-11-12      A      0.2    200      120.03  2019-15-12      B      0.3    100      100.04  2019-11-12      A      0.4    100      120.05  2019-15-12      B      0.2    100      100.0``

## Aggregate by group AND add column to data frame in R

Since you have a tibble, first a dplyr solution. Next a base R version.

using dplyr:

``df1 %>%   group_by(place) %>%   mutate(sum_num = sum(number))# A tibble: 11 x 4# Groups:   place    place animal number sum_num   <chr> <chr>   <dbl>   <dbl> 1 a     cat         5      11 2 a     bear        6      11 3 b     cat         7      22 4 b     bear        4      22 5 b     pig         5      22 6 b     goat        6      22 7 c     cat         8      16 8 c     bear        5      16 9 c     goat        3      1610 d     goat        7      1111 d     bear        4      11``

using base R:

``df1\$sum_num <- ave(df1\$number, df1\$place, FUN = sum)# A tibble: 11 x 4   place animal number sum_num   <chr> <chr>   <dbl>   <dbl> 1 a     cat         5      11 2 a     bear        6      11 3 b     cat         7      22 4 b     bear        4      22 5 b     pig         5      22 6 b     goat        6      22 7 c     cat         8      16 8 c     bear        5      16 9 c     goat        3      1610 d     goat        7      1111 d     bear        4      11``