Keeping Zero Count Combinations When Aggregating with Data.Table

Keeping zero count combinations when aggregating with data.table

Seems like the most straightforward approach is to explicitly supply all category combos in a data.table passed to i=, setting by=.EACHI to iterate over them:

setkey(dt, sex, fruit)
dt[CJ(sex, fruit, unique = TRUE), .N, by = .EACHI]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1

Complete with all combinations after counting on data.table

Here is one possible way to solve your problem. Note that the argument with=FALSE in the data.table context allows to select the columns using the standard data.frame rules. In the example below, I assumed that the columns used to compute all combinations are passed to myfun as a character vector.
Keep in mind that no columns in your dataset should be named gcases. .EACHI in by allows to perform some operation for each row in i.

myfun = function(d, g) {
  # get levels (for factors) and unique values for other types. 
  fn <- function(x) if(is.factor(x)) levels(x) else unique(x)
  gcases <- lapply(setDT(d, key=g)[, g, with=FALSE], fn)
  
  # count based on all combinations
  d[do.call(CJ, gcases), .N, keyby=.EACHI]
}

`data.table` how to get `keyby` to include all combinations of factors?

That's a tidyr/dplyr approach:

dt1 %>% 
  group_by(a,b) %>% 
  summarise(c = length(.)) %>% 
  ungroup %>%
  complete(a,b, fill = list(c = 0))

R data table unique record count based on all combination of a given list of values from 2 columns

In base R, you can do:

data.frame(table(dt))

        Var1       Var2 Freq
1 Col1Value1 Col2Value1    1
2 Col1Value2 Col2Value1    1
3 Col1Value3 Col2Value1    1
4 Col1Value1 Col2Value2    1
5 Col1Value2 Col2Value2    0
6 Col1Value3 Col2Value2    1
7 Col1Value1 Col2Value3    1
8 Col1Value2 Col2Value3    1
9 Col1Value3 Col2Value3    1

Populating a count matrix with permutations of R data.table rows

Here's a data.table solution that seems to be efficient. We basically doing a self join in order to create combinations and then count. Then, similar to what @coldspeed done with Numpy, we will just update a zero matrix by locations with counts.

# a self join
tmp <- dt[dt, 
             .(V1, id = x.V3, id2 = V3), 
             on = .(V1, V3 < V3), 
             nomatch = 0L,
             allow.cartesian = TRUE
          ][, .N, by = .(id, id2)]

## Create a zero matrix and update by locations
m <- array(0L, rep(max(dt$V3), 2L))
m[cbind(tmp$id, tmp$id2)] <- tmp$N
m + t(m)

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,]    0    2    0    0    1    0    0    1
# [2,]    2    0    0    0    1    0    0    1
# [3,]    0    0    0    0    0    0    0    0
# [4,]    0    0    0    0    1    0    1    0
# [5,]    1    1    0    1    0    0    1    1
# [6,]    0    0    0    0    0    0    0    0
# [7,]    0    0    0    1    1    0    0    0
# [8,]    1    1    0    0    1    0    0    0

Alternatively, we could create tmp using data.table::CJ but that could be (potentially - thanks to @Frank for the tip) less memory efficient as it will create all possible combinations first, e.g.

tmp <- dt[, CJ(V3, V3)[V1 < V2], by = .(g = V1)][, .N, by = .(V1, V2)]

## Then, as previously
m <- array(0L, rep(max(dt$V3), 2L))
m[cbind(tmp$V1, tmp$V2)] <- tmp$N
m + t(m)

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,]    0    2    0    0    1    0    0    1
# [2,]    2    0    0    0    1    0    0    1
# [3,]    0    0    0    0    0    0    0    0
# [4,]    0    0    0    0    1    0    1    0
# [5,]    1    1    0    1    0    0    1    1
# [6,]    0    0    0    0    0    0    0    0
# [7,]    0    0    0    1    1    0    0    0
# [8,]    1    1    0    0    1    0    0    0

For R data.table, how to use uniqueN() in order count unique/distinct values in multiple columns?

To answer your question, yes, you can just add both columns to the by argument:

dt[, .(distinct_groups = uniqueN(order_no)), by = c("Name", "Overlimit")]

Keeping Zero Count Combinations When Aggregating with Data.Table