Fastest way for filling-in missing dates for data.table (cont.)
On rolling join, one 'normal' join and some column switching, aaaand you're done :)
temp <- dates[, near.date := dt[dates, x.date, on = .(date=ddate), roll = TRUE, mult = "first"]][]
dt[temp, on = .(date = near.date)][, date := ddate][,ddate := NULL][]
# date gr1 gr2 x
# 1: 2017-01-01 A a 1
# 2: 2017-02-01 A b 2
# 3: 2017-02-01 B a 4
# 4: 2017-03-01 A b 2
# 5: 2017-03-01 B a 4
# 6: 2017-04-01 B a 5
# 7: 2017-05-01 A b 3
# 8: 2017-06-01 A b 3
You can (of course) make it a one-liner by integrating the first row into the last.
Fastest way for filling-in missing dates for data.table
Not sure if it's the fastest, but it'll work if there are no NA
s in the data:
# just in case these aren't Dates.
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)
How to add missing dates and populate values inside groups with 'data.table'
Another option using a rolling join.
setkey(DT, g1, g2, date_obs)
out <- DT[DT[, .(date_obs = seq(first(date_obs),
last(date_obs),
by = "day")), by=.(g1, g2)],
on=.(g1, g2, date_obs),
roll = TRUE]
out
# g1 g2 status date_obs
# 1: 1 1 0 2019-01-02
# 2: 1 2 1 2019-01-01
# 3: 1 2 1 2019-01-02
# 4: 1 2 1 2019-01-03
# 5: 1 2 1 2019-01-04
# 6: 1 2 1 2019-01-05
# 7: 1 2 1 2019-01-06
# 8: 1 2 1 2019-01-07
# 9: 1 2 1 2019-01-08
#10: 1 2 1 2019-01-09
#11: 1 2 1 2019-01-10
#12: 1 2 1 2019-01-11
#13: 1 2 1 2019-01-12
#14: 1 2 1 2019-01-13
#15: 1 2 1 2019-01-14
#16: 1 2 1 2019-01-15
#17: 1 2 1 2019-01-16
#18: 1 2 1 2019-01-17
#19: 1 2 1 2019-01-18
#20: 1 2 1 2019-01-19
#21: 1 2 1 2019-01-20
#22: 1 2 1 2019-01-21
#23: 1 2 1 2019-01-22
#24: 1 2 1 2019-01-23
#25: 1 2 1 2019-01-24
#26: 1 2 1 2019-01-25
#27: 1 2 1 2019-01-26
#28: 1 2 1 2019-01-27
#29: 1 2 1 2019-01-28
#30: 1 2 1 2019-01-29
#31: 1 2 1 2019-01-30
#32: 1 3 0 2019-01-15
#33: 2 1 0 2019-01-20
#34: 2 1 0 2019-01-21
#35: 2 1 0 2019-01-22
#36: 2 1 0 2019-01-23
#37: 2 1 1 2019-01-24
#38: 2 3 1 2019-01-12
# g1 g2 status date_obs
data
DT <- data.table(
g1 = c(1, 1, 2, 1, 2, 2, 1),
g2 = c(2, 1, 3, 3, 1, 1, 2),
status = c(1, 0, 1, 0, 0, 1, 1),
date_obs = as.Date(c("2019-01-01", "2019-01-02", "2019-01-12", "2019-01-15",
"2019-01-20", "2019-01-24", "2019-01-30")))
Quickest way to fill in missing dates in a sequence? SAS
You could just add an OUTPUT statement in a DO loop. The tricky part is getting the next date. Here is a method using a second SET statement that is offset by one observation.
data want;
set have ;
by date;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
next_date = coalesce(next_date,date);
do date=date to next_date;
output;
end;
run;
But your real data probably has multiple stocks. So add some BY group processing.
data want;
set have ;
by stock date;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
if last.stock the next_date=date;
do date=date to next_date;
output;
end;
run;
Is there a data.table way of filling in gaps of time periods?
Just data.table
:
If no key is set, then
data2 <- data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)]
data2
# Date Card A
# <char> <num> <num>
# 1: 2020-01-01 1 1.37095845
# 2: 2020-01-01 2 -0.56469817
# 3: 2020-01-01 3 0.36312841
# 4: 2020-02-01 1 0.63286260
# 5: 2020-02-01 2 NA
# 6: 2020-02-01 3 0.40426832
# 7: 2020-03-01 1 -0.10612452
# 8: 2020-03-01 2 1.51152200
# 9: 2020-03-01 3 -0.09465904
(updated/simplified, thanks to @sindri_baldur!)
If a key is set, then you can use @Frank's method:
data2 <- data[ do.call(CJ, c(mget(key(data)), unique = TRUE)), ]
And from here, you can use nafill
as desired, perhaps
data2[, A := nafill(A, type = "locf"), by = .(Card)]
# Date Card A
# <char> <num> <num>
# 1: 2020-01-01 1 1.37095845
# 2: 2020-01-01 2 -0.56469817
# 3: 2020-01-01 3 0.36312841
# 4: 2020-02-01 1 0.63286260
# 5: 2020-02-01 2 -0.56469817
# 6: 2020-02-01 3 0.40426832
# 7: 2020-03-01 1 -0.10612452
# 8: 2020-03-01 2 1.51152200
# 9: 2020-03-01 3 -0.09465904
(How to fill is based on your knowledge of the context of the data; it might just as easily be by=.(Date)
, or some form of imputation.)
Update: the above does an expansion of possible combinations, which might fill outside of a particular Card
's span, in which case one might see:
data <- data[-1,]
data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)]
# Date Card A
# <char> <num> <num>
# 1: 2020-01-01 1 NA
# 2: 2020-01-01 2 -0.42225588
# 3: 2020-01-01 3 -0.12235017
# 4: 2020-02-01 1 0.18819303
# 5: 2020-02-01 2 NA
# 6: 2020-02-01 3 0.11916096
# 7: 2020-03-01 1 -0.02509255
# 8: 2020-03-01 2 0.10807273
# 9: 2020-03-01 3 -0.48543524
I think there are two approaches to this:
Doing the above code and then removing leading (and trailing)
NA
s per group:data[CJ(Date, Card, unique = TRUE), on = .(Date, Card)
][, .SD[ !is.na(A) | !seq_len(.N) %in% c(1, .N),], by = Card]
# Card Date A
# <num> <char> <num>
# 1: 1 2020-02-01 0.18819303
# 2: 1 2020-03-01 -0.02509255
# 3: 2 2020-01-01 -0.42225588
# 4: 2 2020-02-01 NA
# 5: 2 2020-03-01 0.10807273
# 6: 3 2020-01-01 -0.12235017
# 7: 3 2020-02-01 0.11916096
# 8: 3 2020-03-01 -0.48543524Completely different approach (assuming
Date
-class, not strictly required above):data[,Date := as.Date(Date)]
data[data[, .(Date = do.call(seq, c(as.list(range(Date)), by = "month"))),
by = .(Card)],
on = .(Date, Card)]
# Date Card A
# <Date> <num> <num>
# 1: 2020-01-01 2 -0.42225588
# 2: 2020-02-01 2 NA
# 3: 2020-03-01 2 0.10807273
# 4: 2020-01-01 3 -0.12235017
# 5: 2020-02-01 3 0.11916096
# 6: 2020-03-01 3 -0.48543524
# 7: 2020-02-01 1 0.18819303
# 8: 2020-03-01 1 -0.02509255
How to fill in missing dates
Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):
with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;
kdb - how to augment table with missing dates in a dynamic/fast way
Can you come up with a reproducible example as to why aj
doesn't work? Normal aj
usage should solve this problem:
t1:([]date:.z.D-til 8;sym:`ABC);
t2:`date xasc([]date:.z.D-0 2 3 6 7;sym:`ABC;data:"I"$ssr[;".";""]each string .z.D-0 2 3 6 7);
q)aj[`sym`date;t1;t2]
date sym data
-----------------------
2020.07.20 ABC 20200720
2020.07.19 ABC 20200718
2020.07.18 ABC 20200718
2020.07.17 ABC 20200717
2020.07.16 ABC 20200714
2020.07.15 ABC 20200714
2020.07.14 ABC 20200714
2020.07.13 ABC 20200713
/If you need your last date to fill "upwards" then use fills:
update fills data by sym from aj[`sym`date;([]date:.z.D-til 9;sym:`ABC);t2]
Fill in missing dates mySQL
You can join both tables and then group the result by month.
SELECT max(c.name) as name, c.shop, d.date,
sum(c.amount) as amount
FROM dates d
LEFT JOIN item_changes c ON c.date <= d.date
GROUP BY c.shop, d.date
ORDER BY c.shop, d.date
Related Topics
R Install Package Loaded Namespace
How to Do Conditional Grouping of Data in R
Convert Comma Separated String to Integer in R
R: How to Draw a Line with Multiple Arrows in It
Selection of Activity Trace in a Chart and Display in a Data Table in R Shiny
Developing Geographic Thematic Maps with R
Change Size of Axes Title and Labels in Ggplot2
Print String and Variable Contents on the Same Line in R
Remove Data.Frame Row Names When Using Xtable
Transform Only One Axis to Log10 Scale with Ggplot2
How to Merge Two Columns in R with a Specific Symbol
How to Find Row Number of a Value in R Code