Fastest Way for Filling-In Missing Dates for Data.Table

Fastest way for filling-in missing dates for data.table (cont.)

On rolling join, one 'normal' join and some column switching, aaaand you're done :)

temp <- dates[, near.date := dt[dates, x.date, on = .(date=ddate), roll = TRUE, mult = "first"]][]
dt[temp, on = .(date = near.date)][, date := ddate][,ddate := NULL][]

# date gr1 gr2 x
# 1: 2017-01-01 A a 1
# 2: 2017-02-01 A b 2
# 3: 2017-02-01 B a 4
# 4: 2017-03-01 A b 2
# 5: 2017-03-01 B a 4
# 6: 2017-04-01 B a 5
# 7: 2017-05-01 A b 3
# 8: 2017-06-01 A b 3

You can (of course) make it a one-liner by integrating the first row into the last.

Fastest way for filling-in missing dates for data.table

Not sure if it's the fastest, but it'll work if there are no NAs in the data:

# just in case these aren't Dates. 
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)

How to add missing dates and populate values inside groups with 'data.table'

Another option using a rolling join.

setkey(DT, g1, g2, date_obs)

out <- DT[DT[, .(date_obs = seq(first(date_obs),
last(date_obs),
by = "day")), by=.(g1, g2)],
on=.(g1, g2, date_obs),
roll = TRUE]
out
# g1 g2 status date_obs
# 1: 1 1 0 2019-01-02
# 2: 1 2 1 2019-01-01
# 3: 1 2 1 2019-01-02
# 4: 1 2 1 2019-01-03
# 5: 1 2 1 2019-01-04
# 6: 1 2 1 2019-01-05
# 7: 1 2 1 2019-01-06
# 8: 1 2 1 2019-01-07
# 9: 1 2 1 2019-01-08
#10: 1 2 1 2019-01-09
#11: 1 2 1 2019-01-10
#12: 1 2 1 2019-01-11
#13: 1 2 1 2019-01-12
#14: 1 2 1 2019-01-13
#15: 1 2 1 2019-01-14
#16: 1 2 1 2019-01-15
#17: 1 2 1 2019-01-16
#18: 1 2 1 2019-01-17
#19: 1 2 1 2019-01-18
#20: 1 2 1 2019-01-19
#21: 1 2 1 2019-01-20
#22: 1 2 1 2019-01-21
#23: 1 2 1 2019-01-22
#24: 1 2 1 2019-01-23
#25: 1 2 1 2019-01-24
#26: 1 2 1 2019-01-25
#27: 1 2 1 2019-01-26
#28: 1 2 1 2019-01-27
#29: 1 2 1 2019-01-28
#30: 1 2 1 2019-01-29
#31: 1 2 1 2019-01-30
#32: 1 3 0 2019-01-15
#33: 2 1 0 2019-01-20
#34: 2 1 0 2019-01-21
#35: 2 1 0 2019-01-22
#36: 2 1 0 2019-01-23
#37: 2 1 1 2019-01-24
#38: 2 3 1 2019-01-12
# g1 g2 status date_obs

data

DT <- data.table(
g1 = c(1, 1, 2, 1, 2, 2, 1),
g2 = c(2, 1, 3, 3, 1, 1, 2),
status = c(1, 0, 1, 0, 0, 1, 1),
date_obs = as.Date(c("2019-01-01", "2019-01-02", "2019-01-12", "2019-01-15",
"2019-01-20", "2019-01-24", "2019-01-30")))

Quickest way to fill in missing dates in a sequence? SAS

You could just add an OUTPUT statement in a DO loop. The tricky part is getting the next date. Here is a method using a second SET statement that is offset by one observation.

data want;
set have ;
by date;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
next_date = coalesce(next_date,date);
do date=date to next_date;
output;
end;
run;

But your real data probably has multiple stocks. So add some BY group processing.

data want;
set have ;
by stock date;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
if last.stock the next_date=date;
do date=date to next_date;
output;
end;
run;

Is there a data.table way of filling in gaps of time periods?

Just data.table:

If no key is set, then

data2 <- data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)]
data2
# Date Card A
# <char> <num> <num>
# 1: 2020-01-01 1 1.37095845
# 2: 2020-01-01 2 -0.56469817
# 3: 2020-01-01 3 0.36312841
# 4: 2020-02-01 1 0.63286260
# 5: 2020-02-01 2 NA
# 6: 2020-02-01 3 0.40426832
# 7: 2020-03-01 1 -0.10612452
# 8: 2020-03-01 2 1.51152200
# 9: 2020-03-01 3 -0.09465904

(updated/simplified, thanks to @sindri_baldur!)

If a key is set, then you can use @Frank's method:

data2 <- data[ do.call(CJ, c(mget(key(data)), unique = TRUE)), ]

And from here, you can use nafill as desired, perhaps

data2[, A := nafill(A, type = "locf"), by = .(Card)]
# Date Card A
# <char> <num> <num>
# 1: 2020-01-01 1 1.37095845
# 2: 2020-01-01 2 -0.56469817
# 3: 2020-01-01 3 0.36312841
# 4: 2020-02-01 1 0.63286260
# 5: 2020-02-01 2 -0.56469817
# 6: 2020-02-01 3 0.40426832
# 7: 2020-03-01 1 -0.10612452
# 8: 2020-03-01 2 1.51152200
# 9: 2020-03-01 3 -0.09465904

(How to fill is based on your knowledge of the context of the data; it might just as easily be by=.(Date), or some form of imputation.)


Update: the above does an expansion of possible combinations, which might fill outside of a particular Card's span, in which case one might see:

data <- data[-1,]
data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)]
# Date Card A
# <char> <num> <num>
# 1: 2020-01-01 1 NA
# 2: 2020-01-01 2 -0.42225588
# 3: 2020-01-01 3 -0.12235017
# 4: 2020-02-01 1 0.18819303
# 5: 2020-02-01 2 NA
# 6: 2020-02-01 3 0.11916096
# 7: 2020-03-01 1 -0.02509255
# 8: 2020-03-01 2 0.10807273
# 9: 2020-03-01 3 -0.48543524

I think there are two approaches to this:

  1. Doing the above code and then removing leading (and trailing) NAs per group:

    data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)
    ][, .SD[ !is.na(A) | !seq_len(.N) %in% c(1, .N),], by = Card]
    # Card Date A
    # <num> <char> <num>
    # 1: 1 2020-02-01 0.18819303
    # 2: 1 2020-03-01 -0.02509255
    # 3: 2 2020-01-01 -0.42225588
    # 4: 2 2020-02-01 NA
    # 5: 2 2020-03-01 0.10807273
    # 6: 3 2020-01-01 -0.12235017
    # 7: 3 2020-02-01 0.11916096
    # 8: 3 2020-03-01 -0.48543524
  2. Completely different approach (assuming Date-class, not strictly required above):

    data[,Date := as.Date(Date)]
    data[data[, .(Date = do.call(seq, c(as.list(range(Date)), by = "month"))),
    by = .(Card)],
    on = .(Date, Card)]
    # Date Card A
    # <Date> <num> <num>
    # 1: 2020-01-01 2 -0.42225588
    # 2: 2020-02-01 2 NA
    # 3: 2020-03-01 2 0.10807273
    # 4: 2020-01-01 3 -0.12235017
    # 5: 2020-02-01 3 0.11916096
    # 6: 2020-03-01 3 -0.48543524
    # 7: 2020-02-01 1 0.18819303
    # 8: 2020-03-01 1 -0.02509255

How to fill in missing dates

Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):

with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;

kdb - how to augment table with missing dates in a dynamic/fast way

Can you come up with a reproducible example as to why aj doesn't work? Normal aj usage should solve this problem:

t1:([]date:.z.D-til 8;sym:`ABC);
t2:`date xasc([]date:.z.D-0 2 3 6 7;sym:`ABC;data:"I"$ssr[;".";""]each string .z.D-0 2 3 6 7);

q)aj[`sym`date;t1;t2]
date sym data
-----------------------
2020.07.20 ABC 20200720
2020.07.19 ABC 20200718
2020.07.18 ABC 20200718
2020.07.17 ABC 20200717
2020.07.16 ABC 20200714
2020.07.15 ABC 20200714
2020.07.14 ABC 20200714
2020.07.13 ABC 20200713

/If you need your last date to fill "upwards" then use fills:

update fills data by sym from aj[`sym`date;([]date:.z.D-til 9;sym:`ABC);t2]

Fill in missing dates mySQL

You can join both tables and then group the result by month.

SELECT max(c.name) as name, c.shop, d.date, 
sum(c.amount) as amount
FROM dates d
LEFT JOIN item_changes c ON c.date <= d.date
GROUP BY c.shop, d.date
ORDER BY c.shop, d.date


Related Topics



Leave a reply



Submit