Calculate Multiple Aggregations on Several Variables Using Lapply(.Sd, ...)

Calculate multiple aggregations on several variables using lapply(.SD, ...)

You're missing a [[1]] or $mpg:

mtcars.dt[, lapply(.SD, function(x) list(mean(x), median(x)))[[1]],
            by="cyl", .SDcols=c("mpg")]
#or
mtcars.dt[, lapply(.SD, function(x) list(mean(x), median(x)))$mpg,
            by="cyl", .SDcols=c("mpg")]
#   cyl       V1   V2
#1:   6 19.74286 19.7
#2:   4 26.66364 26.0
#3:   8 15.10000 15.2

For the more general case, try:

mtcars.dt[, as.list(unlist(lapply(.SD, function(x) list(mean=mean(x),
                                                        median=median(x))))),
            by="cyl", .SDcols=c("mpg", "hp")]
#    cyl mpg.mean mpg.median hp.mean hp.median
# 1:   6    19.74       19.7  122.29     110.0
# 2:   4    26.66       26.0   82.64      91.0
# 3:   8    15.10       15.2  209.21     192.5

(or as.list(sapply(.SD, ...)))

How to apply multiple functions to multiple columns within by?

There is an option in unlist to avoid unlisting recursively - the recursive parameter (By default, the recursive = TRUE)

DT[,unlist(lapply(.SD,my.sum.fun), 
      recursive = FALSE),.SDcols=c("mpg","hp"),by=list(cyl)]
#   cyl mpg.mean mpg.median   mpg.sd   hp.mean hp.median    hp.sd
#1:   6 19.74286       19.7 1.453567 122.28571     110.0 24.26049
#2:   4 26.66364       26.0 4.509828  82.63636      91.0 20.93453
#3:   8 15.10000       15.2 2.560048 209.21429     192.5 50.97689

Apply multiple functions to multiple columns in data.table by group

First you need to change your function. data.table expects consistent types and median can return integer or double values depending on input.

my.summary <- function(x) list(mean = mean(x), median = as.numeric(median(x)))

Then you need to ensure that only the first level of the nested list is unlisted. The result of the unlist call still needs to be a list (remember, a data.table is a list of column vectors).

DT[, unlist(lapply(.SD, my.summary), recursive = FALSE), by = c, .SDcols = c("a", "b")]
#   c a.mean a.median b.mean b.median
#1: 1    1.5      1.5    2.5      2.5
#2: 2    4.0      4.0    5.0      5.0

Apply multiple functions to multiple columns in data.table

I'd normally do this:

my.summary = function(x) list(mean = mean(x), median = median(x))

DT[, unlist(lapply(.SD, my.summary)), .SDcols = c('a', 'b')]
#a.mean a.median   b.mean b.median 
#     3        3        4        4

Apply several summary functions on several variables by group in one call

You can do it all in one step and get proper labeling:

> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
#   id1 id2 val1.mn val1.n val2.mn val2.n
# 1   a   x     1.5    2.0     6.5    2.0
# 2   b   x     2.0    2.0     8.0    2.0
# 3   a   y     3.5    2.0     7.0    2.0
# 4   b   y     3.0    2.0     6.0    2.0

This creates a dataframe with two id columns and two matrix columns:

str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame':   4 obs. of  4 variables:
 $ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
 $ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
 $ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mn" "n"
 $ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mn" "n"

As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)

str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 
    )
'data.frame':   4 obs. of  6 variables:
 $ id1    : Factor w/ 2 levels "a","b": 1 2 1 2
 $ id2    : Factor w/ 2 levels "x","y": 1 1 2 2
 $ val1.mn: num  1.5 2 3.5 3
 $ val1.n : num  2 2 2 2
 $ val2.mn: num  6.5 8 7 6
 $ val2.n : num  2 2 2 2

This is the syntax for multiple variables on the LHS:

aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )

How to avoid same column names when multiple transformations in data.table?

You could try

library(data.table)
dt[, unlist(lapply(.SD, function(x) list(Mean=mean(x),
                    SD=sd(x))),recursive=FALSE), by=ID]
#   ID Obs_1.Mean  Obs_1.SD Obs_2.Mean  Obs_2.SD Obs_3.Mean  Obs_3.SD
#1:  1  0.4854187 1.1108687 -0.3238542 0.2885969  0.7410611 0.1067961
#2:  2  0.4171586 0.2875411 -0.2397030 1.8732682  0.2041125 0.3438338
#3:  3 -0.3601052 0.8105370  0.8195368 0.3829833 -0.4087233 1.4705692

Or a variation as suggested by @David Arenburg

 dt[, as.list(unlist(lapply(.SD, function(x) list(Mean=mean(x),
              SD=sd(x))))), by=ID]
 #   ID Obs_1.Mean  Obs_1.SD Obs_2.Mean  Obs_2.SD Obs_3.Mean  Obs_3.SD
 #1:  1  0.4854187 1.1108687 -0.3238542 0.2885969  0.7410611 0.1067961
 #2:  2  0.4171586 0.2875411 -0.2397030 1.8732682  0.2041125 0.3438338
 #3:  3 -0.3601052 0.8105370  0.8195368 0.3829833 -0.4087233 1.4705692

Aggregate multiple columns at once

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
    group_by(id1, id2) %>%
    summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

data.table: Group by, then aggregate with custom function returning several new columns

You are already returning a list from the function. You do not need to list them again. So remove the list and have the code like below

mtcars_dt[,
           returnsMultipleColumns(dt_group_all_columns = .SD),
           by = c("mpg", "cyl"),
           .SDcols = colnames(mtcars_dt)
           ]
     mpg cyl     new_column_1     new_column_2
 1: 21.0   6 returned_value_1 returned_value_2
 2: 22.8   4 returned_value_1 returned_value_2
 3: 21.4   6 returned_value_1 returned_value_2
 4: 18.7   8 returned_value_1 returned_value_2

Calculate Multiple Aggregations on Several Variables Using Lapply(.Sd, ...)