﻿ Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data - ITCodar

# Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data

## Group-by operation for another column R

Just replace `summarise` with `mutate`:

``df %>%   group_by(user) %>%   mutate(Smallest_time1 = min(time_1, na.rm=TRUE))    user score time_1 time_2 Smallest_time1   <dbl> <dbl>  <dbl>  <dbl>          <dbl> 1     1     1    130     NA            120 2     1     0     NA    742            120 3     1     1    120     NA            120 4     1     1    245     NA            120 5     2     0     NA    812            841 6     2     0     NA    212            841 7     2     0     NA    214            841 8     2     1    841     NA            841 9     3     0     NA    919            61210     3     0     NA    528            61211     3     1    721     NA            61212     3     1    612     NA            612``

We could use `min()` inside `summarise` with `na.rm=TRUE` argument:

``library(dplyr)df %>%   group_by(user) %>%   summarise(Smallest_time1 = min(time_1, na.rm= TRUE))``
`` user Smallest_time1  <dbl>          <dbl>1     1            1202     2            8413     3            612``

## Mutate a grouped value (like a conditional mean)

Use the `group_by` before the `mutate` to create the `mean` column by group - instead of creating a `summarise`d dataset and then joining to original data

``library(dplyr)mtcars %>%    group_by(cyl, carb) %>%   mutate(var1 = mean(mpg)) %>%   ungroup %>%   head``

## R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

This may also be done with `pmin/pmax` to create a grouping column

``library(dplyr)library(stringr)df1 %>%    group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>%    mutate(Sum = sum(Count)) %>%   ungroup %>%   select(-grp)``

-output

``# A tibble: 6 × 5  Date  ID1   ID2   Count   Sum  <chr> <chr> <chr> <int> <int>1 12-1  A     B         1     22 12-1  B     A         1     23 12-1  D     E         1     34 12-1  E     D         2     35 12-2  Y     Z         2     56 12-2  Z     Y         3     5``

### data

``df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2", "12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B", "A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)), class = "data.frame", row.names = c(NA, -6L))``

## Adding a column of means by group to original data

This is what the `ave` function is for.

``df1\$Y.New <- ave(df1\$Y, df1\$X)``

## R - Grouping values within a df

Using data in a data.table, we can perform operations on variables by a grouping variable (in `by=`), then assign that back to the data using the data.table assignment operator `:=`

``library(data.table)setDT(df)df[, "family_income" := sum(income), by = id_family]``

The data.table data structure is a pumped up version of the R data.frame, giving added functionality and efficiency gains. If `DT` is your data.table, `DT[i, j, by]` is the notation showing how we can use `i` to sort or subset data, `j` for selecting or computing on variables, and `by` to perfrom `j`-operations on groups. For example, for cars with over 100 horsepower, what is the mean fuel efficiency for automatic (0) and manual (1) cars?

``dtcars <- data.table(mtcars)dtcars[hp>100, mean(mpg), by=am]``

Returns:

``> dtcars[hp>100, mean(mpg), by=am]   am       V11:  1 20.614292:  0 16.06875``

## Create new column that takes the sum of another column values and group by condition in R

In `dplyr`, you usually are using summary functions to get another output. However, with group and ungroup, you can add a summary column.

`` newdf <- df %>%     group_by(Building) %>%     mutate(PopSum = sum(Population, na.rm=TRUE)) %>%     ungroup()``

## Calculating mean by group using dplyr in R

We can use

``library(dplyr)df <- df %>%    group_by(class) %>%    mutate(Mean = mean(x)) %>%    ungroup``

-ouptut

``df# A tibble: 6 x 3        x class    Mean    <dbl> <dbl>   <dbl>1  2.43       1  1.05  2  0.0625     1  1.05  3  0.669      1  1.05  4  0.195      2 -0.05505  0.285      2 -0.05506 -0.644      2 -0.0550``

### data

``df <- data.frame(x, class)``