Fill In The Date Gaps With Date Table
The problem is that you need all customers for all dates. When you do the left outer join
, you are getting NULL for the customer field.
The following sets up a driver table by cross join
ing the customer names and dates:
SELECT driver.customer, driver.fulldate, o.amount
FROM (select d.fulldate, customer
from datetable d cross join
(select customer
from orders
where year(orderdate) in (2012)
) o
where d.calendaryear IN ( 2012 )
) driver LEFT OUTER JOIN
orders o
ON driver.fulldate = o.orderdate and
driver.customer = o.customer;
Note that this version assumes that calendaryear
is the same as year(orderdate)
.
Google BigQuery SQL: How to fill in gaps in a table with dates?
Consider below
with temp as (
select customer, dates from (
select customer, min(dates) min_date, max(dates) max_date
from `project.dataset.table`
group by customer
), unnest(generate_date_array(min_date, max_date)) dates
)
select customer, dates,
first_value(subscription ignore nulls) over win as subscription
from temp a
left join `project.dataset.table` b
using(customer, dates)
window win as (partition by customer order by dates desc rows between current row and unbounded following)
# order by dates, customer
If to apply to sample data in y our question - output is
How to fill in missing dates
Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):
with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;
Fill missing gaps in data using a date column
Because SQL Server does not support IGNORE NULLS
in LAG()
this is a bit tricky. I would go for a recursive subquery of the form:
with cte as (
select price, date, dateadd(day, -1, lead(date) over (order by date)) as last_date
from t
union all
select price, dateadd(day, 1, date), last_date
from cte
where date < last_date
)
select price, date
from cte
order by date;
Here is a db<>fiddle.
In SQL Server 2008, you can replace the lead()
with:
with cte as (
select price, date,
(select min(date)
from t t2
where t2.date > t.date
) as last_date
from t
union all
select price, dateadd(day, 1, date), last_date
from cte
where date < last_date
)
select price, date
from cte
order by date;
Fill gaps in SQL Server dates ranges
I don't think this is that complicated. If you expand the periods into individual dates and do a left join
, then this becomes a gaps-and-islands problem:
with dates as (
select periodid, periodstart as dte, periodend
from periods
union all
select periodid, dateadd(day, 1, dte), periodend
from dates
where dte < periodend
)
select userid, activityid, min(dte), max(dte)
from (select d.dte, d.periodid, u.userid, a.activityid,
row_number() over (partition by u.userid, a.activityid order by d.dte) as seqnum
from dates d cross join
(select distinct userid from activities) u left join
activities a
on a.userid = u.userid and
a.activitystart <= d.dte and a.activityend >= d.dte
) da
group by userid, activityid, periodid, dateadd(day, -seqnum, dte)
order by userid, min(dte);
Here is a db<>fiddle.
Note: This produces results for all users and all periods -- which seems reasonable given your description. It is pretty simple to modify to filter out users with no activity during a given period.
Also, this does not go to the end of the month. Instead, it includes the complete periods. I don't see why months would play into this -- except to confuse matters -- consider if two periods have days in the same month, for instance.
View to fill date gaps but only if no records for a given date
You can use a table-valued function, which will allow you to use parameters, so there is no need to hard-code them.
The table returned includes an extra column Ins
; it is 0 for the original data and 1 for the inserted.
Rextester demo
If you prefer a view you can easily extract the logic.
create function myFun(@start date, @end date)
returns @result table (MyName varchar(10), MyDate date, Ins bit) as
begin
;
with dates as (
SELECT DATEADD(DAY, nbr - 1, @start) dt
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS Nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(DAY, @start, @end)
), last_avail as (
select dates.dt, max(t.MyDate) prev_dt
from dates join t on dates.dt >= t.MyDate
group by dates.dt
), empty as (
select * from last_avail where dt <> prev_dt
)
insert @result (MyName, MyDate, Ins)
select MyName, MyDate, 0 from t
union all
select t.MyName, x.dt, 1
from empty x join t on x.prev_dt = t.MyDate;
return;
end
Usage example:
declare @start date, @end date;
set @start = '2017-04-08';
set @end = '2017-04-11';
select * from dbo.myFun(@start, @end) order by MyDate desc, MyName asc;
Attribution
The Common Table Expression which plays the role of a calendar table has been found here.
Fill all gaps between starting and ending dates with dplyr
library(dplyr)
user_df %>%
arrange(start_date) %>%
group_by(person_id) %>%
mutate(nextstart = lead(start_date)) %>%
filter(end_date < nextstart) %>%
mutate(start_date = end_date, end_date = nextstart, unemployed = 1L) %>%
select(-nextstart) %>%
bind_rows(mutate(user_df, unemployed = 0L)) %>%
arrange(person_id, start_date) %>%
ungroup()
# # A tibble: 14 x 6
# person_id job_number job_type start_date end_date unemployed
# <int> <int> <chr> <chr> <chr> <int>
# 1 1 1 B 2012-11-01 2014-01-01 0
# 2 1 1 B 2014-01-01 2016-02-01 1
# 3 1 2 A 2016-02-01 2016-10-01 0
# 4 1 2 A 2016-10-01 2016-12-01 1
# 5 1 3 A 2016-12-01 2020-01-01 0
# 6 1 4 B 2020-01-01 2021-01-01 0
# 7 2 1 A 2011-03-01 2012-08-01 0
# 8 2 1 A 2012-08-01 2013-01-01 1
# 9 2 2 B 2013-01-01 2020-01-01 0
# 10 2 3 A 2020-01-01 2021-01-01 0
# 11 2 4 B 2021-01-01 2021-01-17 0
# 12 3 1 A 2005-03-01 2011-03-01 0
# 13 3 1 A 2011-03-01 2012-01-01 1
# 14 3 2 B 2012-01-01 2014-01-01 0
Technically, this is comparing by the alphabetic sort of dates; in this case, its effect is the same (the format is good for that) though it'll be slightly less efficient (integer/numeric sorting is faster than alphabetic sorting).
This works by first creating and then capture just the unemployed periods of time,
user_df %>%
arrange(start_date) %>%
group_by(person_id) %>%
mutate(nextstart = lead(start_date)) %>%
filter(end_date < nextstart)
# # A tibble: 4 x 6
# # Groups: person_id [3]
# person_id job_number job_type start_date end_date nextstart
# <int> <int> <chr> <chr> <chr> <chr>
# 1 3 1 A 2005-03-01 2011-03-01 2012-01-01
# 2 2 1 A 2011-03-01 2012-08-01 2013-01-01
# 3 1 1 B 2012-11-01 2014-01-01 2016-02-01
# 4 1 2 A 2016-02-01 2016-10-01 2016-12-01
then shifting the variables, then adding unemployed
, and then finally returning it to the original dataset. In this case, I added unemployed
to the original mid-bind_rows
; where to do this is mostly preference.
Related Topics
Sqlite Equivalent of Postgresql's Greatest Function
Dynamic Column in Select Statement Postgres
Self-Referencing Constraint in Ms SQL
How to Substitute a String If Record Is Null in T-Sql
How to Count the Number of Words in a String in Oracle
Insert Deleted Values into a Table Before Delete with a Delete Trigger
Why Does Nvl Always Evaluate 2Nd Parameter
Linq Orderby. Does It Always Return the Same Ordered List
Strip Non-Numeric Characters from a String
Transpose Rows into Columns in SQL Server 2008 R2
Microsoft Access Query Should Return True or True and False, Only Returns True
Find Min and Max for Subsets of Consecutive Rows - Gaps and Islands
How to Get the Nth Row in a SQL Server Table
Why Does Running This Query with Execute Immediate Cause It to Fail