Add Missing Rows to a Data Table

Add missing rows within a table

The hint would be: Use a join.

One way of approaching this is, that you select the key pairs that you expect and then left join the original table. Be conscious about the missing-value handling, since you have not specified in your question what should happen to those newly created entries.

Test Data

CREATE TABLE test (id INTEGER, doc INTEGER, posi INTEGER, total INTEGER);
INSERT INTO test VALUES (1, 123, 1, 100);
INSERT INTO test VALUES (1, 123, 2, 600);
INSERT INTO test VALUES (1, 123, 3, 200);
INSERT INTO test VALUES (2, 123, 1, 100);
INSERT INTO test VALUES (2, 123, 2, 600);
INSERT INTO test VALUES (2, 123, 3, 200);
INSERT INTO test VALUES (3, 123, 1, 100);
INSERT INTO test VALUES (3, 123, 3, 200);

The possible key combinations can be generated with a cross join:

SELECT DISTINCT a.id, b.posi 
FROM test a, test b

And now join the original table:

WITH expected_lines AS (
    SELECT DISTINCT a.id, b.posi 
    FROM test a, test b
)
SELECT el.id, el.posi, t.doc, t.total
FROM expected_lines el
LEFT JOIN test t ON el.id = t.id AND el.posi = t.posi

You did not describe further, what should happen with the now empty columns. As you may note DOC and TOTAL are null.

My educated guess would be, that you want to make DOC part of the key and assume a TOTAL of 0. If that's the case, you can go with the following:

WITH expected_lines AS (
    SELECT DISTINCT a.id, b.posi, c.doc 
    FROM test a, test b, test c
)
SELECT el.id, el.posi, el.doc, ifnull(t.total, 0) total
FROM expected_lines el
LEFT JOIN test t ON el.id = t.id AND el.posi = t.posi AND el.doc = t.doc

Result
Sample Image

Add missing rows to data.table

As indicated in @Roland's comment, instead of value = value in CJ(), use:

value = seq_len(max(value))

Or specify the range you would like in your value column.

Thus, you simply need to modify your attempt from being:

b = a[CJ(group = group, value = value, unique = TRUE), on = .(group,value)]

to being:

b = a[CJ(group = group, value = seq_len(max(value)), unique = TRUE),
      on = .(group,value)]

Add missing rows to data.table according to multiple keyed columns

A couple of possibilities are here - https://github.com/Rdatatable/data.table/pull/814

CJ.dt = function(...) {
  rows = do.call(CJ, lapply(list(...), function(x) if(is.data.frame(x)) seq_len(nrow(x)) else seq_along(x)));
  do.call(data.table, Map(function(x, y) x[y], list(...), rows))
}

setkey(mydata, name, job, sex, from)

mydata[CJ.dt(unique(data.table(name, job, sex)), unique(from))]
#     name     job    sex from      score
# 1: chris  doctor   male  NYT  0.7383247
# 2: chris  doctor   male   BG         NA
# 3: chris  doctor   male TIME         NA
# 4: chris  doctor   male USAT         NA
# 5: chris  lawyer female  NYT         NA
# 6: chris  lawyer female   BG -0.8204684
# 7: chris  lawyer female TIME         NA
# 8: chris  lawyer female USAT         NA
# 9: chris  lawyer   male  NYT  0.4874291
#10: chris  lawyer   male   BG         NA
#11: chris  lawyer   male TIME         NA
#12: chris  lawyer   male USAT         NA
#13:  john teacher   male  NYT -0.6264538
#14:  john teacher   male   BG -0.8356286
#15:  john teacher   male TIME  1.5952808
#16:  john teacher   male USAT  0.1836433
#17:  mary  police female  NYT         NA
#18:  mary  police female   BG         NA
#19:  mary  police female TIME         NA
#20:  mary  police female USAT  0.3295078

add missing rows to a data table

I'd get the unique values in id1 and id2 and do a join using data.table's cross join function CJ as follows:

# if you've already set the key:
ans <- f[CJ(id1, id2, unique=TRUE)][is.na(v), v := 0L][]

# or, if f is not keyed:
ans <- f[CJ(id1 = id1, id2 = id2, unique=TRUE), on=.(id1, id2)][is.na(v), v := 0L][]

ans

adding missing observations in data.table

I believe the issue is that CJ(l, l, 1994:1995) has duplicate names. This is hinted at by verbose=TRUE:

DT[CJ(l,l,1994:1995), verbose=TRUE]
# forder.c received a vector type 'character' length 3
# forder.c received a vector type 'character' length 3
# forder.c received a vector type 'integer' length 2
# i.l has same type (character) as x.from. No coercion needed.
# i.l has same type (character) as x.to. No coercion needed.
# i.V3 has same type (integer) as x.year. No coercion needed.
# on= matches existing key, using key
# Starting bmerge ...
# bmerge done in 0.000s elapsed (0.000s cpu) 
# Constructing irows for '!byjoin || nqbyjoin' ... 0.000s elapsed (0.000s cpu)

This is in a gray area between being a bug or not... better behavior might be to error instead of proceed with potentially wrong results.

Anyway, you can get around this by naming the CJ arguments:

DT[CJ(from = l, to = l, year = 1994:1995)]
#     from to year          g
#  1:    a  a 1994 0.64364200
#  2:    a  a 1995         NA
#  3:    a  b 1994 0.69746294
#  4:    a  b 1995 0.56863539
#  5:    a  c 1994 0.64369566
#  6:    a  c 1995         NA
#  7:    b  a 1994 0.62198311
#  8:    b  a 1995 0.71919139
#  9:    b  b 1994 0.76170866
# 10:    b  b 1995 0.84792449
# 11:    b  c 1994 0.15793127
# 12:    b  c 1995 0.26623733
# 13:    c  a 1994 0.89921463
# 14:    c  a 1995 0.55417635
# 15:    c  b 1994 0.38938166
# 16:    c  b 1995 0.03778206
# 17:    c  c 1994 0.48918988
# 18:    c  c 1995 0.75206221

Note that we could also accomplish this without keys:

setkey(DT, NULL)
# for those more familiar with SQL syntax, this is a NATURAL JOIN;
#   it's equivalent to `on = c("from", "to", "year")`
DT[CJ(from = l, to = l, year = 1994:1995), on = .NATURAL]

How to add missing rows to a data frame

You can get the cohort years range and use summarize() to expand the dataset, then left join back on the orginal:

df<-ungroup(df)

yrs = range(as.numeric(levels(df$cohort)))
unique(df[,c(1,3)]) %>% 
  group_by(var2kreuz,var2use) %>% 
  summarize(cohort = factor(yrs[1]:yrs[2])) %>% 
  left_join(df)

Alternatively, you can use complete() like this:

df %>% mutate(across(c(var2kreuz, var2use),as.character)) %>% 
  complete(var2kreuz, var2use,cohort)

Output:

   var2kreuz var2use cohort  n proportion
1        KKK     yes   2010 10  0.5555556
2        KKK     yes   2011 19  0.5937500
3        KKK     yes   2012 24  0.4615385
4        KKK     yes   2013 19  0.4750000
5        KKK     yes   2014 21  0.5675676
6        KKK     yes   2015 NA         NA
7        KKK     yes   2016 NA         NA
8        KKK     yes   2017 NA         NA
9        KKK     yes   2018 23  0.6388889
10       KKK     yes   2019 38  0.6031746
11       KKK     yes   2020 24  0.4615385
12       KKK      no   2010  8  0.4444444
13       KKK      no   2011 13  0.4062500
14       KKK      no   2012 28  0.5384615
15       KKK      no   2013 21  0.5250000
16       KKK      no   2014 16  0.4324324
17       KKK      no   2015 NA         NA
18       KKK      no   2016 NA         NA
19       KKK      no   2017 NA         NA
20       KKK      no   2018 13  0.3611111
21       KKK      no   2019 25  0.3968254
22       KKK      no   2020 28  0.5384615

Adding row for missing value in data.table

You just do the same thing as in your linked question by each ida:

setkey(dt, idb, date)

dt[, .SD[CJ(unique(idb), unique(date))], by = ida][is.na(value), value := 0][]
#   ida idb value       date
#1:   A   2 26600 2004-12-31
#2:   A   2     0 2005-03-31
#3:   A   3     0 2004-12-31
#4:   A   3 19600 2005-03-31
#5:   C   2  8700 2005-12-31
#6:   B   3 18200 2005-06-30
#7:   B   3     0 2005-09-30
#8:   B   4     0 2005-06-30
#9:   B   4  1230 2005-09-30

Insert all missing rows into data table for a range of values for 2 columns

Instead of the already existing values in 'a' column, we can have a range of values to pass into 'CJ' for the 'a'

dt1[CJ(a = 1:7, b, unique = TRUE)]