Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data

Group-by operation for another column R

Update on OP request(see comments):
Just replace summarise with mutate:

df %>% 
  group_by(user) %>% 
  mutate(Smallest_time1 = min(time_1, na.rm=TRUE))

    user score time_1 time_2 Smallest_time1
   <dbl> <dbl>  <dbl>  <dbl>          <dbl>
 1     1     1    130     NA            120
 2     1     0     NA    742            120
 3     1     1    120     NA            120
 4     1     1    245     NA            120
 5     2     0     NA    812            841
 6     2     0     NA    212            841
 7     2     0     NA    214            841
 8     2     1    841     NA            841
 9     3     0     NA    919            612
10     3     0     NA    528            612
11     3     1    721     NA            612
12     3     1    612     NA            612

We could use min() inside summarise with na.rm=TRUE argument:

library(dplyr)
df %>% 
  group_by(user) %>% 
  summarise(Smallest_time1 = min(time_1, na.rm= TRUE))

 user Smallest_time1
  <dbl>          <dbl>
1     1            120
2     2            841
3     3            612

Mutate a grouped value (like a conditional mean)

Use the group_by before the mutate to create the mean column by group - instead of creating a summarised dataset and then joining to original data

library(dplyr)
mtcars %>% 
   group_by(cyl, carb) %>%
   mutate(var1 = mean(mpg)) %>%
   ungroup %>%
   head

R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

This may also be done with pmin/pmax to create a grouping column

library(dplyr)
library(stringr)
df1 %>% 
   group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>% 
   mutate(Sum = sum(Count)) %>%
   ungroup %>%
   select(-grp)

-output

# A tibble: 6 × 5
  Date  ID1   ID2   Count   Sum
  <chr> <chr> <chr> <int> <int>
1 12-1  A     B         1     2
2 12-1  B     A         1     2
3 12-1  D     E         1     3
4 12-1  E     D         2     3
5 12-2  Y     Z         2     5
6 12-2  Z     Y         3     5

data

df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2", 
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B", 
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
 class = "data.frame", row.names = c(NA, 
-6L))

Adding a column of means by group to original data

This is what the ave function is for.

df1$Y.New <- ave(df1$Y, df1$X)

R - Grouping values within a df

Using data in a data.table, we can perform operations on variables by a grouping variable (in by=), then assign that back to the data using the data.table assignment operator :=

library(data.table)
setDT(df)
df[, "family_income" := sum(income), by = id_family]

The data.table data structure is a pumped up version of the R data.frame, giving added functionality and efficiency gains. If DT is your data.table, DT[i, j, by] is the notation showing how we can use i to sort or subset data, j for selecting or computing on variables, and by to perfrom j-operations on groups. For example, for cars with over 100 horsepower, what is the mean fuel efficiency for automatic (0) and manual (1) cars?

dtcars <- data.table(mtcars)
dtcars[hp>100, mean(mpg), by=am]

Returns:

> dtcars[hp>100, mean(mpg), by=am]
   am       V1
1:  1 20.61429
2:  0 16.06875

Create new column that takes the sum of another column values and group by condition in R

In dplyr, you usually are using summary functions to get another output. However, with group and ungroup, you can add a summary column.

 newdf <- df %>% 
    group_by(Building) %>% 
    mutate(PopSum = sum(Population, na.rm=TRUE)) %>% 
    ungroup()

Calculating mean by group using dplyr in R

We can use

library(dplyr)
df <- df %>%
    group_by(class) %>%
    mutate(Mean = mean(x)) %>%
    ungroup

-ouptut

df
# A tibble: 6 x 3
        x class    Mean
    <dbl> <dbl>   <dbl>
1  2.43       1  1.05  
2  0.0625     1  1.05  
3  0.669      1  1.05  
4  0.195      2 -0.0550
5  0.285      2 -0.0550
6 -0.644      2 -0.0550

data

df <- data.frame(x, class)

Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data

Group-by operation for another column R

Mutate a grouped value (like a conditional mean)

R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

data

Adding a column of means by group to original data

R - Grouping values within a df

Create new column that takes the sum of another column values and group by condition in R

Calculating mean by group using dplyr in R

data

Related Topics

Leave a reply