Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data

Group-by operation for another column R

Update on OP request(see comments):
Just replace summarise with mutate:

df %>% 
group_by(user) %>%
mutate(Smallest_time1 = min(time_1, na.rm=TRUE))

user score time_1 time_2 Smallest_time1
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 130 NA 120
2 1 0 NA 742 120
3 1 1 120 NA 120
4 1 1 245 NA 120
5 2 0 NA 812 841
6 2 0 NA 212 841
7 2 0 NA 214 841
8 2 1 841 NA 841
9 3 0 NA 919 612
10 3 0 NA 528 612
11 3 1 721 NA 612
12 3 1 612 NA 612

We could use min() inside summarise with na.rm=TRUE argument:

library(dplyr)
df %>%
group_by(user) %>%
summarise(Smallest_time1 = min(time_1, na.rm= TRUE))
 user Smallest_time1
<dbl> <dbl>
1 1 120
2 2 841
3 3 612

Mutate a grouped value (like a conditional mean)

Use the group_by before the mutate to create the mean column by group - instead of creating a summarised dataset and then joining to original data

library(dplyr)
mtcars %>%
group_by(cyl, carb) %>%
mutate(var1 = mean(mpg)) %>%
ungroup %>%
head

R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

This may also be done with pmin/pmax to create a grouping column

library(dplyr)
library(stringr)
df1 %>%
group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>%
mutate(Sum = sum(Count)) %>%
ungroup %>%
select(-grp)

-output

# A tibble: 6 × 5
Date ID1 ID2 Count Sum
<chr> <chr> <chr> <int> <int>
1 12-1 A B 1 2
2 12-1 B A 1 2
3 12-1 D E 1 3
4 12-1 E D 2 3
5 12-2 Y Z 2 5
6 12-2 Z Y 3 5

data

df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2", 
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B",
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
class = "data.frame", row.names = c(NA,
-6L))

Adding a column of means by group to original data

This is what the ave function is for.

df1$Y.New <- ave(df1$Y, df1$X)

R - Grouping values within a df

Using data in a data.table, we can perform operations on variables by a grouping variable (in by=), then assign that back to the data using the data.table assignment operator :=

library(data.table)
setDT(df)
df[, "family_income" := sum(income), by = id_family]

The data.table data structure is a pumped up version of the R data.frame, giving added functionality and efficiency gains. If DT is your data.table, DT[i, j, by] is the notation showing how we can use i to sort or subset data, j for selecting or computing on variables, and by to perfrom j-operations on groups. For example, for cars with over 100 horsepower, what is the mean fuel efficiency for automatic (0) and manual (1) cars?

dtcars <- data.table(mtcars)
dtcars[hp>100, mean(mpg), by=am]

Returns:

> dtcars[hp>100, mean(mpg), by=am]
am V1
1: 1 20.61429
2: 0 16.06875

Create new column that takes the sum of another column values and group by condition in R

In dplyr, you usually are using summary functions to get another output. However, with group and ungroup, you can add a summary column.

 newdf <- df %>% 
group_by(Building) %>%
mutate(PopSum = sum(Population, na.rm=TRUE)) %>%
ungroup()

Calculating mean by group using dplyr in R

We can use

library(dplyr)
df <- df %>%
group_by(class) %>%
mutate(Mean = mean(x)) %>%
ungroup

-ouptut

df
# A tibble: 6 x 3
x class Mean
<dbl> <dbl> <dbl>
1 2.43 1 1.05
2 0.0625 1 1.05
3 0.669 1 1.05
4 0.195 2 -0.0550
5 0.285 2 -0.0550
6 -0.644 2 -0.0550

data

df <- data.frame(x, class)


Related Topics



Leave a reply



Submit