Fastest Way for Filling-In Missing Dates for Data.Table

Fastest way for filling-in missing dates for data.table (cont.)

On rolling join, one 'normal' join and some column switching, aaaand you're done :)

temp <- dates[, near.date := dt[dates, x.date, on = .(date=ddate), roll = TRUE, mult = "first"]][]
dt[temp, on = .(date = near.date)][, date := ddate][,ddate := NULL][]

#          date gr1 gr2 x
# 1: 2017-01-01   A   a 1
# 2: 2017-02-01   A   b 2
# 3: 2017-02-01   B   a 4
# 4: 2017-03-01   A   b 2
# 5: 2017-03-01   B   a 4
# 6: 2017-04-01   B   a 5
# 7: 2017-05-01   A   b 3
# 8: 2017-06-01   A   b 3

You can (of course) make it a one-liner by integrating the first row into the last.

Fastest way for filling-in missing dates for data.table

Not sure if it's the fastest, but it'll work if there are no NAs in the data:

# just in case these aren't Dates. 
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)

How to add missing dates and populate values inside groups with 'data.table'

Another option using a rolling join.

setkey(DT, g1, g2, date_obs)

out <- DT[DT[, .(date_obs = seq(first(date_obs),
                                last(date_obs),
                                by = "day")), by=.(g1, g2)],
          on=.(g1, g2, date_obs),
          roll = TRUE]
out
#    g1 g2 status   date_obs
# 1:  1  1      0 2019-01-02
# 2:  1  2      1 2019-01-01
# 3:  1  2      1 2019-01-02
# 4:  1  2      1 2019-01-03
# 5:  1  2      1 2019-01-04
# 6:  1  2      1 2019-01-05
# 7:  1  2      1 2019-01-06
# 8:  1  2      1 2019-01-07
# 9:  1  2      1 2019-01-08
#10:  1  2      1 2019-01-09
#11:  1  2      1 2019-01-10
#12:  1  2      1 2019-01-11
#13:  1  2      1 2019-01-12
#14:  1  2      1 2019-01-13
#15:  1  2      1 2019-01-14
#16:  1  2      1 2019-01-15
#17:  1  2      1 2019-01-16
#18:  1  2      1 2019-01-17
#19:  1  2      1 2019-01-18
#20:  1  2      1 2019-01-19
#21:  1  2      1 2019-01-20
#22:  1  2      1 2019-01-21
#23:  1  2      1 2019-01-22
#24:  1  2      1 2019-01-23
#25:  1  2      1 2019-01-24
#26:  1  2      1 2019-01-25
#27:  1  2      1 2019-01-26
#28:  1  2      1 2019-01-27
#29:  1  2      1 2019-01-28
#30:  1  2      1 2019-01-29
#31:  1  2      1 2019-01-30
#32:  1  3      0 2019-01-15
#33:  2  1      0 2019-01-20
#34:  2  1      0 2019-01-21
#35:  2  1      0 2019-01-22
#36:  2  1      0 2019-01-23
#37:  2  1      1 2019-01-24
#38:  2  3      1 2019-01-12
#    g1 g2 status   date_obs

data

DT <- data.table(
  g1 = c(1, 1, 2, 1, 2, 2, 1),
  g2 = c(2, 1, 3, 3, 1, 1, 2),
  status = c(1, 0, 1, 0, 0, 1, 1),
  date_obs = as.Date(c("2019-01-01", "2019-01-02", "2019-01-12", "2019-01-15",
                       "2019-01-20", "2019-01-24", "2019-01-30")))

Quickest way to fill in missing dates in a sequence? SAS

You could just add an OUTPUT statement in a DO loop. The tricky part is getting the next date. Here is a method using a second SET statement that is offset by one observation.

data want;
  set have ;
  by date;
  set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
  next_date = coalesce(next_date,date);
  do date=date to next_date;
    output;
  end;
run;

But your real data probably has multiple stocks. So add some BY group processing.

data want;
  set have ;
  by stock date;
  set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
  if last.stock the next_date=date;
  do date=date to next_date;
    output;
  end;
run;

Is there a data.table way of filling in gaps of time periods?

Just data.table:

If no key is set, then

data2 <- data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)]
data2
#          Date  Card           A
#        <char> <num>       <num>
# 1: 2020-01-01     1  1.37095845
# 2: 2020-01-01     2 -0.56469817
# 3: 2020-01-01     3  0.36312841
# 4: 2020-02-01     1  0.63286260
# 5: 2020-02-01     2          NA
# 6: 2020-02-01     3  0.40426832
# 7: 2020-03-01     1 -0.10612452
# 8: 2020-03-01     2  1.51152200
# 9: 2020-03-01     3 -0.09465904

(updated/simplified, thanks to @sindri_baldur!)

If a key is set, then you can use @Frank's method:

data2 <- data[ do.call(CJ, c(mget(key(data)), unique = TRUE)), ]

And from here, you can use nafill as desired, perhaps

data2[, A := nafill(A, type = "locf"), by = .(Card)]
#          Date  Card           A
#        <char> <num>       <num>
# 1: 2020-01-01     1  1.37095845
# 2: 2020-01-01     2 -0.56469817
# 3: 2020-01-01     3  0.36312841
# 4: 2020-02-01     1  0.63286260
# 5: 2020-02-01     2 -0.56469817
# 6: 2020-02-01     3  0.40426832
# 7: 2020-03-01     1 -0.10612452
# 8: 2020-03-01     2  1.51152200
# 9: 2020-03-01     3 -0.09465904

(How to fill is based on your knowledge of the context of the data; it might just as easily be by=.(Date), or some form of imputation.)

Update: the above does an expansion of possible combinations, which might fill outside of a particular Card's span, in which case one might see:

data <- data[-1,]
data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)]
#          Date  Card           A
#        <char> <num>       <num>
# 1: 2020-01-01     1          NA
# 2: 2020-01-01     2 -0.42225588
# 3: 2020-01-01     3 -0.12235017
# 4: 2020-02-01     1  0.18819303
# 5: 2020-02-01     2          NA
# 6: 2020-02-01     3  0.11916096
# 7: 2020-03-01     1 -0.02509255
# 8: 2020-03-01     2  0.10807273
# 9: 2020-03-01     3 -0.48543524

I think there are two approaches to this:

Doing the above code and then removing leading (and trailing) NAs per group:

data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)
  ][, .SD[ !is.na(A) | !seq_len(.N) %in% c(1, .N),], by = Card]
#     Card       Date           A
#    <num>     <char>       <num>
# 1:     1 2020-02-01  0.18819303
# 2:     1 2020-03-01 -0.02509255
# 3:     2 2020-01-01 -0.42225588
# 4:     2 2020-02-01          NA
# 5:     2 2020-03-01  0.10807273
# 6:     3 2020-01-01 -0.12235017
# 7:     3 2020-02-01  0.11916096
# 8:     3 2020-03-01 -0.48543524

Completely different approach (assuming Date-class, not strictly required above):

data[,Date := as.Date(Date)]
data[data[, .(Date = do.call(seq, c(as.list(range(Date)), by = "month"))), 
          by = .(Card)],
     on = .(Date, Card)]
#          Date  Card           A
#        <Date> <num>       <num>
# 1: 2020-01-01     2 -0.42225588
# 2: 2020-02-01     2          NA
# 3: 2020-03-01     2  0.10807273
# 4: 2020-01-01     3 -0.12235017
# 5: 2020-02-01     3  0.11916096
# 6: 2020-03-01     3 -0.48543524
# 7: 2020-02-01     1  0.18819303
# 8: 2020-03-01     1 -0.02509255

How to fill in missing dates

Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):

with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;

kdb - how to augment table with missing dates in a dynamic/fast way

Can you come up with a reproducible example as to why aj doesn't work? Normal aj usage should solve this problem:

t1:([]date:.z.D-til 8;sym:`ABC);
t2:`date xasc([]date:.z.D-0 2 3 6 7;sym:`ABC;data:"I"$ssr[;".";""]each string .z.D-0 2 3 6 7);

q)aj[`sym`date;t1;t2]
date       sym data
-----------------------
2020.07.20 ABC 20200720
2020.07.19 ABC 20200718
2020.07.18 ABC 20200718
2020.07.17 ABC 20200717
2020.07.16 ABC 20200714
2020.07.15 ABC 20200714
2020.07.14 ABC 20200714
2020.07.13 ABC 20200713

/If you need your last date to fill "upwards" then use fills:

update fills data by sym from aj[`sym`date;([]date:.z.D-til 9;sym:`ABC);t2]

Fill in missing dates mySQL

You can join both tables and then group the result by month.

SELECT max(c.name) as name, c.shop, d.date, 
  sum(c.amount) as amount
FROM dates d
LEFT JOIN item_changes c ON c.date <= d.date
GROUP BY c.shop, d.date
ORDER BY c.shop, d.date