Apply Several Summary Functions on Several Variables by Group in One Call

Apply several summary functions on several variables by group in one call

You can do it all in one step and get proper labeling:

> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
# id1 id2 val1.mn val1.n val2.mn val2.n
# 1 a x 1.5 2.0 6.5 2.0
# 2 b x 2.0 2.0 8.0 2.0
# 3 a y 3.5 2.0 7.0 2.0
# 4 b y 3.0 2.0 6.0 2.0

This creates a dataframe with two id columns and two matrix columns:

str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame': 4 obs. of 4 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
$ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"

As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)

str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 
)
'data.frame': 4 obs. of 6 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1.mn: num 1.5 2 3.5 3
$ val1.n : num 2 2 2 2
$ val2.mn: num 6.5 8 7 6
$ val2.n : num 2 2 2 2

This is the syntax for multiple variables on the LHS:

aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )

Aggregate multiple variables with different functions

set.seed(45)
df <- data.frame(c1=rep(c("A","A","B","B"), 2),
c2 = rep(c("A","B"), 4),
v1 = sample(8),
v2 = sample(1:100, 8))
> df
# c1 c2 v1 v2
# 1 A A 6 19
# 2 A B 3 1
# 3 B A 2 37
# 4 B B 8 86
# 5 A A 5 30
# 6 A B 1 44
# 7 B A 7 41
# 8 B B 4 39

v1 <- aggregate( v1 ~ c1 + c2, data = df, sum)
v2 <- aggregate( v2 ~ c1 + c2, data = df, mean)
out <- merge(v1, v2, by=c("c1","c2"))
> out
# c1 c2 v1 v2
# 1 A A 11 24.5
# 2 A B 4 22.5
# 3 B A 9 39.0
# 4 B B 12 62.5

**Edit:** I'd propose that you use data.table as it makes things really easy:

require(data.table)
dt <- data.table(df)
dt.out <- dt[, list(s.v1=sum(v1), m.v2=mean(v2)),
by=c("c1","c2")]
> dt.out

# c1 c2 s.v1 m.v2
# 1: A A 11 24.5
# 2: A B 4 22.5
# 3: B A 9 39.0
# 4: B B 12 62.5

Group by or sum more column

Try this:

iris %>% 
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width, sum))


# A tibble: 3 x 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 250. 171. 73.1 12.3
2 versicolor 297. 138. 213 66.3
3 virginica 329. 149. 278. 101.

Add group names based on several variables

library(data.table)
DT <- as.data.table(DATA.RH.TAM)
DT[, grouping1 := paste0("group", .GRP), by = .(classwork, stream, people, others)]
DT[, grouping2 := paste0("group", .GRP), by = .(classwork, stream, people)]
# classwork stream people others index grouping1 grouping2
# 1: High High High High 1 group1 group1
# 2: High Low High High 1 group2 group2
# 3: High High High High 1 group1 group1
# 4: Low Low High High 1 group3 group3
# 5: High High High Low 1 group4 group1
# ---
# 152: High High High High 1 group1 group1
# 153: High Low High High 1 group2 group2
# 154: Low Low High High 1 group3 group3
# 155: High High High High 1 group1 group1
# 156: High High High High 1 group1 group1

To apply two functions to multiply variables with aggregate() in r

We can use dplyr, where we pass the grouping columns in group_by and the columns to summarise in summarise with across

library(dplyr) #1.0.0
x3 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'),
list(mean = ~ mean(., na.rm = TRUE) , sd = ~sd(., na.rm = TRUE))))
# A tibble: 4 x 6
# Groups: id1 [2]
# id1 id2 val1_mean val1_sd val2_mean val2_sd
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 a x 1.5 0.707 6.5 3.54
#2 a y 3.5 0.707 NaN NA
#3 b x 2 1.41 9 NA
#4 b y 3 1.41 8 NA

If the version of dplyr is < 1.0.0, we can use summarise_at

x3 %>%
group_by(id1, id2) %>%
summarise_at(vars(-group_cols()), list(mean = ~ mean(., na.rm = TRUE),
sd = ~ sd(., na.rm = TRUE)))

With aggregate, the error we get because of the NA elements and it uses by default na.action = na.drop removing the row if there is any NA in that row. Either specify na.action = na.pass or NULL and this would resolve that issue. But, having multiple functions to be applied with c, it will result in a matrix column. Inorder to have normal data.frame, columns, we can wrap with data.frame in do.call

do.call(data.frame, aggregate(. ~ id1 + id2, data = x3, FUN = function(x) 
c(avg = mean(x, na.rm = TRUE), SD= sd(x, na.rm = TRUE)), na.action = NULL))

Summarise for multiple group_by variables combined and individually

Here is an idea using bind_rows,

library(dplyr)

mtcars %>%
group_by(cyl, vs) %>%
summarise(new = mean(wt)) %>%
bind_rows(.,
mtcars %>% group_by(cyl) %>% summarise(new = mean(wt)) %>% mutate(vs = NA),
mtcars %>% group_by(vs) %>% summarise(new = mean(wt)) %>% mutate(cyl = NA)) %>%
arrange(cyl) %>%
ungroup()

# A tibble: 10 × 3
# cyl vs new
# <dbl> <dbl> <dbl>
#1 4 0 2.140000
#2 4 1 2.300300
#3 4 NA 2.285727
#4 6 0 2.755000
#5 6 1 3.388750
#6 6 NA 3.117143
#7 8 0 3.999214
#8 8 NA 3.999214
#9 NA 0 3.688556
#10 NA 1 2.611286


Related Topics



Leave a reply



Submit