Determine Contiguous Dates in SQL Gaps and Islands

Detect consecutive dates in SQL and start counter

One way to tackle such Gaps-And-Islands problem is to calculate a rank that's based on the difference with the previous date.

The method to calculate the difference in days between dates however depends on your RDBMS.

This example uses DATEDIFF (MS Sql Server)

SELECT 
Dates,
ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Dates) AS Counter
FROM
(
SELECT
Dates,
SUM(CASE
WHEN DATEDIFF(day, prevDate, Dates) = 1 THEN 0
ELSE 1 END) OVER (ORDER BY Dates) AS Rnk
FROM
(
SELECT
Dates,
LAG(Dates) OVER (ORDER BY Dates) AS prevDate
FROM your_table
) q1
) q2
ORDER BY Dates;

SQL query to find gaps within a column of dates

This is a type of gaps-and-islands problem. In this case, subtracting a sequential number from each day is probably the simplest solution for identifying the "islands":

select user, status, count(*) as num_days, min(date), max(date)
from (select t.*,
row_number() over (partition by user, status order by date) as seqnum
from t
) t
group by user, status, date - seqnum * interval '1 day'

SQL - Group rows by contiguous date

For RDBMS supported window functions (example on MS SQL database):

with Test(value, dt) as(
select 100, cast('2000-01-01' as date) union all
select 110, cast('2002-05-01' as date) union all
select 100, cast('2003-10-01' as date) union all
select 100, cast('2004-12-01' as date)
)
select max(value) value, min(dt) startDate, max(end_dt) endDate
from (
select a.*, sum(brk) over(order by dt) grp
from (
select t.*,
case when value!=lag(value) over(order by dt) then 1 else 0 end brk,
DATEADD(DAY,-1,lead(dt,1,cast('2099-01-02' as date)) over(order by dt)) end_dt
from Test t
) a
) b
group by grp
order by startDate

How to form groups of consecutive dates allowing for a given maximum gap?

Counting the gaps (greater than given tolerance) in a second window function forms the group numbers you are after:

SELECT person_id, contact_day
, count(*) FILTER (WHERE gap > 3) OVER (PARTITION BY person_id ORDER BY contact_day) AS dash_group
FROM (
SELECT person_id, contact_day
, contact_day - lag(contact_day) OVER (PARTITION BY person_id ORDER BY contact_day) AS gap
FROM mydata
) sub
ORDER BY person_id, contact_day; -- optional

db<>fiddle here

About the aggregate FILTER clause:

  • Aggregate columns with additional (distinct) filters

It's short and intuitive, and typically fastest. See:

  • For absolute performance, is SUM faster or COUNT?

It's the classic topic of "gaps and islands". Once you know to look for the tag gaps-and-islands, you'll find plenty of related or nigh identical questions and answers like:

  • Select longest continuous sequence
  • How to group timestamps into islands (based on arbitrary gap)?
  • How to label groups in postgresql when group belonging depends on the preceding line?

Etc.

I tagged accordingly now.

Finding a consecutive date ranges in ms sql server

Try the following:

WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY PRODUCT_TIER5_DESC ORDER BY BEGIN_EFFECTIVE_DT) -
ROW_NUMBER() OVER (PARTITION BY PRODUCT_TIER5_DESC,PRODUCT_TIER4_DESC ORDER BY BEGIN_EFFECTIVE_DT) GRP
FROM test_0907
)
SELECT PRODUCT_TIER4_DESC, PRODUCT_TIER5_DESC,
MIN(BEGIN_EFFECTIVE_DT) BEGIN_EFFECTIVE_DT, MAX(END_EFFECTIVE_DT) END_EFFECTIVE_DT
FROM CTE
GROUP BY PRODUCT_TIER4_DESC, PRODUCT_TIER5_DESC,GRP

See a demo.

Group islands of contiguous dates, including missing weekends

I find that is easier to use lag() and a window sum to implement the logic you want:

select employee, min(actionDate) actionStart, max(actionDate) actionEnd, count(*) actionLength
from (
select
a.*, sum(
case when actionDate = dateadd(day, 1, lagActionDate)
or (actionDate = dateadd(day, 3, lagActionDate) and datename(weekday, actionDate) = 'Monday')
then 0 else 1 end
) over(partition by employee order by actionDate) grp
from (
select
a.*,
lag(actionDate) over(partition by employee order by actionDate) lagActionDate
from actions a
) a
) a
group by employee, grp

Demo on DB Fiddle:


employee | actionStart | actionEnd | actionLength
:------- | :---------- | :--------- | -----------:
AA | 2019-01-03 | 2019-01-08 | 4
BB | 2019-08-01 | 2019-08-06 | 6
CC | 2019-09-09 | 2019-09-18 | 8

find start and stop date for contiguous dates in multiple rows

The old answer had a weakness: every row is checked only with the previous to verify if the period are overlapping, if an earlier row have a period that last more the logic will not consider it. For example:

Code | DateStart           | DateFrom            | Overlap
-----+---------------------+---------------------+---------
1006 | 2014-06-18 07:00:00 | 2014-06-18 19:00:00 | 0
1006 | 2014-06-18 08:10:00 | 2014-06-18 10:00:00 | 1
1006 | 2014-06-18 16:00:00 | 2014-06-18 20:30:00 | 0 <- don't overlap with
previous but overlap
with the first

To improve that PrevStop need to become LastStop and have the value of the max of the previous DateFrom for the Code

With N AS (
SELECT Code, DateFrom, DateTo
, LastStop = MAX(DateTo)
OVER (PARTITION BY Code ORDER BY DateFrom, DateTo
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM Table1
), B AS (
SELECT Code, DateFrom, DateTo
, Block = SUM(CASE WHEN LastStop Is Null Then 1
WHEN LastStop < DateFrom Then 1
ELSE 0
END)
OVER (PARTITION BY Code ORDER BY DateFrom, LastStop)
FROM N
)
SELECT Code
, MIN(DateFrom) DateFrom
, MAX(DateTo) DateTo
FROM B
GROUP BY Code, Block
ORDER BY Code, Block

SQLFiddle Demo

ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING is needed to remove the current row from theMAX.


Old Answer

This query works only every the period is not completely inside the precedent.

The idea is to check for every row if it is linked with the next/previous one.

If rows are linked they form a block and they will be grouped together to get the first DateFrom and the last DateTo

With N AS (
SELECT Code, DateFrom, DateTo
, PrevStop = LAG(DateTo, 1, NULL)
OVER (PARTITION BY Code ORDER BY DateFrom)
FROM Table1
), B AS (
SELECT Code, DateFrom, DateTo
, Block = SUM(CASE WHEN PrevStop Is Null Then 1
WHEN PrevStop < DateFrom Then 1
ELSE 0
END)
OVER (PARTITION BY Code ORDER BY PrevStop)
FROM N
)
SELECT Code
, MIN(DateFrom) DateFrom
, MAX(DateTo) DateTo
FROM B
GROUP BY Code, Block
ORDER BY Code, Block

SQLFiddle demo with some data added to check with more block on the same code/day

The query search for the block starter checking every row if they are the first for the code (PrevStop IS NULL) or if they are outside the previous one (PrevStop < DateFrom).

The windowed SUM retrieve only the previous row by the ORDER to create costant value for block of linked data, for example with the test data we will get

Code | DateStart           | DateFrom            | Starter | Block
-----+---------------------+---------------------+---------+------
1006 | 2014-06-18 07:00:00 | 2014-06-18 09:00:00 | 1 | 1
1006 | 2014-06-18 08:10:00 | 2014-06-18 06:00:00 | 0 | 1
1006 | 2014-06-18 08:00:00 | 2014-06-18 08:30:00 | 0 | 1
1006 | 2014-06-18 07:00:00 | 2014-06-18 07:30:00 | 1 | 2
1006 | 2014-06-18 08:00:00 | 2014-06-18 08:30:00 | 1 | 3
1006 | 2014-06-18 08:10:00 | 2014-06-18 09:00:00 | 0 | 3
3006 | 2014-06-18 07:00:00 | 2014-06-18 08:00:00 | 1 | 1
3006 | 2014-06-18 09:00:00 | 2014-06-18 10:00:00 | 1 | 2

grouping by Code and Block get the result



Related Topics



Leave a reply



Submit