How to Fill Date Gaps in MySQL

MySQL how to fill missing dates in range?

MySQL doesn't have recursive functionality, so you're left with using the NUMBERS table trick -

  1. Create a table that only holds incrementing numbers - easy to do using an auto_increment:

    DROP TABLE IF EXISTS `example`.`numbers`;
    CREATE TABLE `example`.`numbers` (
    `id` int(10) unsigned NOT NULL auto_increment,
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
  2. Populate the table using:

    INSERT INTO `example`.`numbers`
    ( `id` )
    VALUES
    ( NULL )

    ...for as many values as you need.

  3. Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value. Replace "2010-06-06" and "2010-06-14" with your respective start and end dates (but use the same format, YYYY-MM-DD) -

    SELECT `x`.*
    FROM (SELECT DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY)
    FROM `numbers` `n`
    WHERE DATE_ADD('2010-06-06', INTERVAL `n`.`id` -1 DAY) <= '2010-06-14' ) x
  4. LEFT JOIN onto your table of data based on the time portion:

       SELECT `x`.`ts` AS `timestamp`,
    COALESCE(`y`.`score`, 0) AS `cnt`
    FROM (SELECT DATE_FORMAT(DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY), '%m/%d/%Y') AS `ts`
    FROM `numbers` `n`
    WHERE DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY) <= '2010-06-14') x
    LEFT JOIN TABLE `y` ON STR_TO_DATE(`y`.`date`, '%d.%m.%Y') = `x`.`ts`

If you want to maintain the date format, use the DATE_FORMAT function:

DATE_FORMAT(`x`.`ts`, '%d.%m.%Y') AS `timestamp`

How to fill in missing dates

Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):

with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;

Fill missing dates in mysql query range

The best way that I've found to do this is to simply create (and maintain) a secondary table with a single column, containing all of the dates that you care about. Something like:

CREATE TABLE date_join (
date date not null primary key
);

Then insert records for each date in whatever way is convenient (by hand, if it's a one-off, as part of your daily process, via stored procedure, etc).

At that point, it's simply a left join of date_join and your initial query, with a CASE statement to translate NULLs to 0s:

SELECT dj.date, q.numar
FROM date_join dj
LEFT JOIN (select date(updated_at) as date, COUNT(id) as numar
from `coupons`
where `user_id` = 5 and `won_by` != 0 and `updated_at` >= '2016-04-01'
group by DATE(updated_at)
) q
ON dj.date = q.date
ORDER BY dj.date;

MySQL - fill missing dates

In general, you can generate a series of N integers in MySQL using:

    select (@i:=@i+1)-1 as `myval` from someTable,(SELECT @i:=0) gen_sub limit N

Note that the table that you join on (someTable) must have at least N rows. The -1 above is to make it base-zero... remove it and you'll get 1,2,3 for N=3.

You can feed those integers into the DATE_ADD function to turn it into a series of dates. To make it easier to follow, let's use some user variables for the dates.

SET @date_min = '2016-03-04';
SET @date_max = '2016-03-10';

select DATE_ADD(@date_min, INTERVAL (@i:=@i+1)-1 DAY) as `date`
from information_schema.columns,(SELECT @i:=0) gen_sub
where DATE_ADD(@date_min,INTERVAL @i DAY) BETWEEN @date_min AND @date_max

That will return rows for those days and every day between them. Now it's just a matter of joining against your table... I haven't tried it since I don't have your db structure, but something like the following should work:

SET @date_min = '2016-03-04';
SET @date_max = '2016-03-10';

SELECT
date_generator.date,
ifnull(SUM(val1),0) as sum_val
from (
select DATE_ADD(@date_min, INTERVAL (@i:=@i+1)-1 DAY) as `date`
from information_schema.columns,(SELECT @i:=0) gen_sub
where DATE_ADD(@date_min,INTERVAL @i DAY) BETWEEN @date_min AND @date_max
) date_generator
left join table1 on table1.date = date_generator.date
GROUP BY date;

Building report with date gaps

As some days have no sales, you will have a NULL sales.vendor.id value, so nothing to JOIN to vendors. To make sure you get all the vendors on all the dates, you need to do a CROSS JOIN from date_generator to vendors, and then LEFT JOIN to sales. Then you need to take the SUM of sales.price and GROUP BY the date and the vendor to get your desired output:

SELECT date_generator.daily_date, vendors.name, SUM(sales.price)
FROM (
SELECT DATE_ADD(@startDate, INTERVAL (@i:=@i + 1) - 1 DAY) AS daily_date
FROM information_schema.columns, (SELECT @i:=0) gen_sub
WHERE DATE_ADD(@startDate, INTERVAL @i DAY) BETWEEN @startDate AND @endDate
) date_generator
CROSS JOIN vendors
LEFT JOIN sales ON DATE(sales.date) = DATE(date_generator.daily_date) AND vendors.id = sales.vendor_id
GROUP BY date_generator.daily_date, vendors.name

Fill missing gaps in data using a date column

Because SQL Server does not support IGNORE NULLS in LAG() this is a bit tricky. I would go for a recursive subquery of the form:

with cte as (
select price, date, dateadd(day, -1, lead(date) over (order by date)) as last_date
from t
union all
select price, dateadd(day, 1, date), last_date
from cte
where date < last_date
)
select price, date
from cte
order by date;

Here is a db<>fiddle.

In SQL Server 2008, you can replace the lead() with:

with cte as (
select price, date,
(select min(date)
from t t2
where t2.date > t.date
) as last_date
from t
union all
select price, dateadd(day, 1, date), last_date
from cte
where date < last_date
)
select price, date
from cte
order by date;

Filling Month and Year Gaps in Data

You could try changing your temp_months table to include year like so:

create table temp_months (yr int, mth int, primary key (yr, mth));
insert into temp_months values
(2020, 1), (2020, 2), (2020, 3), (2020, 4), (2020, 5), (2020, 6),
(2019, 7), (2019, 8), (2019, 9), (2019, 10), (2019, 11), (2019, 12);

Assuming your my_table was like so,

create table my_table (created_date date, company_id int, id int);
insert into my_table values
('2020-05-01', 123456, 1),
('2020-02-01', 123456, 1),('2020-02-01', 123456, 1),('2020-02-01', 123456, 1),
('2020-01-01', 123456, 1),
('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),('2019-10-01', 123456, 1),
('2019-08-01', 123456, 1),('2019-08-01', 123456, 1);

You could run this kind of query:

select count(p.id), yr as year, mth as month
from temp_months tm
left join my_table p
on month(created_date)=tm.mth
and year(created_date)=tm.yr
group by yr, mth
order by yr desc, mth desc

Result will be


count(p.id) | year | month
----------: | ---: | ----:
0 | 2020 | 6
1 | 2020 | 5
0 | 2020 | 4
0 | 2020 | 3
3 | 2020 | 2
1 | 2020 | 1
0 | 2019 | 12
0 | 2019 | 11
9 | 2019 | 10
0 | 2019 | 9
2 | 2019 | 8
0 | 2019 | 7

If you want NULL to be displayed, you could use:

with result as (
select count(p.id) as counter, yr as year, mth as month
from temp_months tm
left join my_table p
on month(created_date)=tm.mth
and year(created_date)=tm.yr
group by yr, mth
order by yr desc, mth desc
)
select
case when counter = 0 then NULL else counter end as counter,
year, month
from result;

Result will be


counter | year | month
------: | ---: | ----:
null | 2020 | 6
1 | 2020 | 5
null | 2020 | 4
null | 2020 | 3
3 | 2020 | 2
1 | 2020 | 1
null | 2019 | 12
null | 2019 | 11
9 | 2019 | 10
null | 2019 | 9
2 | 2019 | 8
null | 2019 | 7

Example: https://dbfiddle.uk/?rdbms=mariadb_10.4&fiddle=2ee3594614494d3397a996d7ff815859

To manually but quickly populate temp_months table, I type in a year worth of values like so:

insert into temp_table values
(2019, 1), (2019, 2), (2019, 3), (2019, 4), (2019, 5), (2019, 6),
(2019, 7), (2019, 8), (2019, 9), (2019, 10), (2019, 11), (2019, 12);

Then, I copy that into a text editor, find/replace 2019 to 2020 and execute again...and so on. Within seconds, I'd have years worth of data in temp_table.

The other option is to create a stored procedure to populate it on demand based on example here: How to populate a table with a range of dates?

MySQL how to fill missing hours/dates in range?

Not the prettiest. But it should do the trick if you really can't use temp tables:

select ifnull(count,0) as count,dh.hour_of_day,
dh.day_of_week,date_format((date('2012-01-02') + interval dh.day_of_week day),'%W') as name_of_day
from
(
select day_of_week,hour_of_day
from
(
select 0 as day_of_week union select 1 union select 2 union select 3
union select 4 union select 5 union select 6
) d
join
(
select 0 as hour_of_day
union select 1 union select 2 union select 3 union select 4
union select 5 union select 6 union select 7 union select 8
union select 9 union select 10 union select 11 union select 12
union select 13 union select 14 union select 15 union select 16
union select 17 union select 18 union select 19 union select 20
union select 21 union select 22 union select 23
) h
) dh
left outer join
(
SELECT
count(id) as count,
HOUR(created) as hour_of_day,
WEEKDAY(created) as day_of_week,
DATE_FORMAT(created,'%W') name_of_day
FROM visitors
GROUP BY day_of_week,hour_of_day
) v on dh.day_of_week = v.day_of_week and dh.hour_of_day = v.hour_of_day
ORDER BY dh.day_of_week,dh.hour_of_day ASC;

Careful with this though! If you run the query across multiple weeks then multiple days of the week will get added together. You may want to consider adding a 'only this week' predicate. For example add in where yearweek(created) = yearweek(now()) into your original select to get data just for the current week.



Related Topics



Leave a reply



Submit