Apply several summary functions on several variables by group in one call
You can do it all in one step and get proper labeling:
> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
# id1 id2 val1.mn val1.n val2.mn val2.n
# 1 a x 1.5 2.0 6.5 2.0
# 2 b x 2.0 2.0 8.0 2.0
# 3 a y 3.5 2.0 7.0 2.0
# 4 b y 3.0 2.0 6.0 2.0
This creates a dataframe with two id columns and two matrix columns:
str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame': 4 obs. of 4 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
$ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)
str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
)
'data.frame': 4 obs. of 6 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1.mn: num 1.5 2 3.5 3
$ val1.n : num 2 2 2 2
$ val2.mn: num 6.5 8 7 6
$ val2.n : num 2 2 2 2
This is the syntax for multiple variables on the LHS:
aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
Aggregate multiple variables with different functions
set.seed(45)
df <- data.frame(c1=rep(c("A","A","B","B"), 2),
c2 = rep(c("A","B"), 4),
v1 = sample(8),
v2 = sample(1:100, 8))
> df
# c1 c2 v1 v2
# 1 A A 6 19
# 2 A B 3 1
# 3 B A 2 37
# 4 B B 8 86
# 5 A A 5 30
# 6 A B 1 44
# 7 B A 7 41
# 8 B B 4 39
v1 <- aggregate( v1 ~ c1 + c2, data = df, sum)
v2 <- aggregate( v2 ~ c1 + c2, data = df, mean)
out <- merge(v1, v2, by=c("c1","c2"))
> out
# c1 c2 v1 v2
# 1 A A 11 24.5
# 2 A B 4 22.5
# 3 B A 9 39.0
# 4 B B 12 62.5
**Edit:**
I'd propose that you use data.table
as it makes things really easy:
require(data.table)
dt <- data.table(df)
dt.out <- dt[, list(s.v1=sum(v1), m.v2=mean(v2)),
by=c("c1","c2")]
> dt.out
# c1 c2 s.v1 m.v2
# 1: A A 11 24.5
# 2: A B 4 22.5
# 3: B A 9 39.0
# 4: B B 12 62.5
Group by or sum more column
Try this:
iris %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width, sum))
# A tibble: 3 x 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 250. 171. 73.1 12.3
2 versicolor 297. 138. 213 66.3
3 virginica 329. 149. 278. 101.
Add group names based on several variables
library(data.table)
DT <- as.data.table(DATA.RH.TAM)
DT[, grouping1 := paste0("group", .GRP), by = .(classwork, stream, people, others)]
DT[, grouping2 := paste0("group", .GRP), by = .(classwork, stream, people)]
# classwork stream people others index grouping1 grouping2
# 1: High High High High 1 group1 group1
# 2: High Low High High 1 group2 group2
# 3: High High High High 1 group1 group1
# 4: Low Low High High 1 group3 group3
# 5: High High High Low 1 group4 group1
# ---
# 152: High High High High 1 group1 group1
# 153: High Low High High 1 group2 group2
# 154: Low Low High High 1 group3 group3
# 155: High High High High 1 group1 group1
# 156: High High High High 1 group1 group1
To apply two functions to multiply variables with aggregate() in r
We can use dplyr
, where we pass the grouping columns in group_by
and the columns to summarise
in summarise
with across
library(dplyr) #1.0.0
x3 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'),
list(mean = ~ mean(., na.rm = TRUE) , sd = ~sd(., na.rm = TRUE))))
# A tibble: 4 x 6
# Groups: id1 [2]
# id1 id2 val1_mean val1_sd val2_mean val2_sd
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 a x 1.5 0.707 6.5 3.54
#2 a y 3.5 0.707 NaN NA
#3 b x 2 1.41 9 NA
#4 b y 3 1.41 8 NA
If the version of dplyr
is < 1.0.0, we can use summarise_at
x3 %>%
group_by(id1, id2) %>%
summarise_at(vars(-group_cols()), list(mean = ~ mean(., na.rm = TRUE),
sd = ~ sd(., na.rm = TRUE)))
With aggregate
, the error we get because of the NA
elements and it uses by default na.action = na.drop
removing the row if there is any NA in that row. Either specify na.action = na.pass
or NULL
and this would resolve that issue. But, having multiple functions to be applied with c
, it will result in a matrix
column. Inorder to have normal data.frame
, columns, we can wrap with data.frame
in do.call
do.call(data.frame, aggregate(. ~ id1 + id2, data = x3, FUN = function(x)
c(avg = mean(x, na.rm = TRUE), SD= sd(x, na.rm = TRUE)), na.action = NULL))
Summarise for multiple group_by variables combined and individually
Here is an idea using bind_rows
,
library(dplyr)
mtcars %>%
group_by(cyl, vs) %>%
summarise(new = mean(wt)) %>%
bind_rows(.,
mtcars %>% group_by(cyl) %>% summarise(new = mean(wt)) %>% mutate(vs = NA),
mtcars %>% group_by(vs) %>% summarise(new = mean(wt)) %>% mutate(cyl = NA)) %>%
arrange(cyl) %>%
ungroup()
# A tibble: 10 × 3
# cyl vs new
# <dbl> <dbl> <dbl>
#1 4 0 2.140000
#2 4 1 2.300300
#3 4 NA 2.285727
#4 6 0 2.755000
#5 6 1 3.388750
#6 6 NA 3.117143
#7 8 0 3.999214
#8 8 NA 3.999214
#9 NA 0 3.688556
#10 NA 1 2.611286
Related Topics
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
Dynamically Select Data Frame Columns Using $ and a Character Value
Order Discrete X Scale by Frequency/Value
Extract Row Corresponding to Minimum Value of a Variable by Group
Add Count of Unique/Distinct Values by Group to the Original Data
How to Add Texture to Fill Colors in Ggplot2
Ggplot2 - Bar Plot With Both Stack and Dodge
Using Reshape from Wide to Long in R
Expert R Users, What's in Your .Rprofile
Looping Over a Date or Posixct Object Results in a Numeric Iterator
Removing Space Between Numeric Values in R
Grouping Functions (Tapply, By, Aggregate) and the *Apply Family
Split Data.Frame Based on Levels of a Factor into New Data.Frames
How to Select the Rows With Maximum Values in Each Group With Dplyr
Count Number of Rows in a Data Frame in R Based on Group
Dictionary Style Replace Multiple Items