Aggregating Sub Totals and Grand Totals with Data.Table

Aggregating sub totals and grand totals with data.table

In recent devel data.table you can use new feature called "grouping sets" to produce sub totals:

library(data.table)
set.seed(1)
DT = data.table(
group=sample(letters[1:2],100,replace=TRUE),
year=sample(2010:2012,100,replace=TRUE),
v=runif(100))

cube(DT, mean(v), by=c("group","year"))
# group year V1
# 1: a 2011 0.4176346
# 2: b 2010 0.5231845
# 3: b 2012 0.4306871
# 4: b 2011 0.4997119
# 5: a 2012 0.4227796
# 6: a 2010 0.2926945
# 7: NA 2011 0.4463616
# 8: NA 2010 0.4278093
# 9: NA 2012 0.4271160
#10: a NA 0.3901875
#11: b NA 0.4835788
#12: NA NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
# grouping group year V1
# 1: 0 a 2011 0.4176346
# 2: 0 b 2010 0.5231845
# 3: 0 b 2012 0.4306871
# 4: 0 b 2011 0.4997119
# 5: 0 a 2012 0.4227796
# 6: 0 a 2010 0.2926945
# 7: 2 NA 2011 0.4463616
# 8: 2 NA 2010 0.4278093
# 9: 2 NA 2012 0.4271160
#10: 1 a NA 0.3901875
#11: 1 b NA 0.4835788
#12: 3 NA NA 0.4350153

R data.table fast way to append total of columns when doing aggregations across categories?

rollup calculates aggregates at various levels of groupings producing multiple totals(/subtotals):

library(data.table)  # I'm using version 1.11.4
rollup(DT, j = lapply(.SD, sum), by = c("t", "x"))
# t x y v
# <Date> <char> <num> <int>
# 1: 2018-05-01 b 12 9
# 2: 2018-07-01 b 8 6
# 3: 2018-07-01 a 12 24
# 4: 2018-05-01 a 8 16
# 5: 2018-05-01 c 12 39
# 6: 2018-07-01 c 8 26
# 7: 2018-07-01 d 12 54
# 8: 2018-05-01 d 8 36
# 9: 2018-05-01 <NA> 40 100 # <- total for date 2018-05-01
#10: 2018-07-01 <NA> 40 110 # <- total for date 2018-07-01
#11: <NA> <NA> 80 210 # <- grand total

The help file for rollup states:

Calculate aggregates at various levels of groupings producing multiple (sub-)totals. Reflects SQLs GROUPING SETS operations.

SSRS: Table of group aggregates with grand total

You need to mess around a little to get right layout, but you basically need a group header or footer to show the Group aggregates.

if you right click on the lines next to [statusCategory] and select Insert Row > Outside Group - Above it will create a group header for you. Then just add your aggregate expressions in the correct columns like =Count(Fields!projectStatus.Value) then you can just set Row Visibilty on the details row to hide if you dont want to see it.

you should end up with a layout similar to this.

Sample Image

with the middle row (detail) visibility set to false

Add Group Subheader and Subtotal Rows to data.frame or table in R

Instead of applying adorn_totals on the entire summary, use group_modify and then convert to gt

library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
library(gt)
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%

gt()

-output

Sample Image


An option is also to split the column with expss

library(expss)
library(openxlsx)
out <- d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%
ungroup %>%
split_columns(columns = 1)
wb <- createWorkbook()
sh <- addWorksheet(wb, "Tables")
xl_write(out, wb, sh)
saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)

-output

Sample Image

R: pivoting & subtotals in data.table?

Another option using data.table::cube:

cb <- cube(DT, sum(value), by=c("date","category"), id=TRUE)

cb[grouping==0L, .(date, category,

PropByDate = V1 / cb[grouping==1L][.SD, on="date", x.V1],

PropByCategory = V1 / cb[grouping==2L][.SD, on="category", x.V1],

PropByTotal = V1 / cb[grouping==3L, V1]
)]

output:

   date category PropByDate PropByCategory PropByTotal
1: 1 1 0.3333333 0.2500000 0.1
2: 1 2 0.6666667 0.3333333 0.2
3: 2 1 0.4285714 0.7500000 0.3
4: 2 2 0.5714286 0.6666667 0.4

data:

DT <- data.table(date=c(1, 1, 2, 2), category=c(1, 2, 1, 2), value=1:4)

# date category value
#1: 1 1 1
#2: 1 2 2
#3: 2 1 3
#4: 2 2 4

subtotals by group R

Using DF in the Note at the end try this one-liner. The same code works if there are a different number of columns. Also try it without the as.data.frame for wide format. No packages are used.

as.data.frame(addmargins(xtabs(~., DF)))

giving:

  Var1 Var2 Freq
1 a b 1
2 b b 1
3 Sum b 2
4 a c 1
5 b c 0
6 Sum c 1
7 a Sum 2
8 b Sum 1
9 Sum Sum 3

Note

DF in reproducible form is:

DF <- structure(list(Var1 = structure(c(1L, 1L, 2L), .Label = c("a", 
"b"), class = "factor"), Var2 = structure(c(1L, 2L, 1L), .Label = c("b",
"c"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))

Aggregating multiple subtotals?

If I get your question right, you can use

acast(aqm, day ~ variable ~ month, mean, margins = c("variable", "month"))[,,'(all)']

The acast gets you the summary for each day over each variable over each month. The total aggregate "slice" ([,,'(all)']) has a row for each day, with a column for each variable (averaged over all months) and a '(all)' column averaging each day, over all variables over all months.

Is this what you needed?

Compare Grand total of Group by subtotals in SQL Server

You need to apply a Group Sum on Quantity(MG):

select ITEM, STORE, sumQuantity
from
(
Select ITEM, STORE, SUM(Quantity) as sumQuantity
,SUM(SUM(quantity)) OVER (PARTITION BY ITEM) as groupSum
From....
.........
.........
Group by ITEM, STORE
) as dt
where groupSum > 50


Related Topics



Leave a reply



Submit