Eliminate and Reduce Overlapping Date Ranges

Eliminate and reduce overlapping date ranges

For SQL Server 2005+

-- sample table with data
declare @t table(UserID int, StartDate datetime, EndDate datetime)
insert @t select
1, '20110101', '20110102' union all select
1, '20110101', '20110110' union all select
1, '20110108', '20110215' union all select
1, '20110220', '20110310' union all select
2, '20110101', '20110120' union all select
2, '20110115', '20110125'

-- your query starts below

select UserID, Min(NewStartDate) StartDate, MAX(enddate) EndDate
from
(
select *,
NewStartDate = t.startdate+v.number,
NewStartDateGroup =
dateadd(d,
1- DENSE_RANK() over (partition by UserID order by t.startdate+v.number),
t.startdate+v.number)
from @t t
inner join master..spt_values v
on v.type='P' and v.number <= DATEDIFF(d, startdate, EndDate)
) X
group by UserID, NewStartDateGroup
order by UserID, StartDate

Notes:

  1. Replace @t with your table name

Determine Whether Two Date Ranges Overlap

(StartA <= EndB) and (EndA >= StartB)

Proof:

Let ConditionA Mean that DateRange A Completely After DateRange B

_                        |---- DateRange A ------|
|---Date Range B -----| _

(True if StartA > EndB)

Let ConditionB Mean that DateRange A is Completely Before DateRange B

|---- DateRange A -----|                        _ 
_ |---Date Range B ----|

(True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true -

(If one range is neither completely after the other,

nor completely before the other,
then they must overlap.)

Now one of De Morgan's laws says that:

Not (A Or B) <=> Not A And Not B

Which translates to: (StartA <= EndB) and (EndA >= StartB)


NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,

change the >= operators to >, and <= to <


NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:

{ endA-startA, endA - startB, endB-startA, endB - startB }

(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)


NOTE3. Thanks to @tomosius, a shorter version reads:

DateRangesOverlap = max(start1, start2) < min(end1, end2)

This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:

If start and end dates can be out of order, i.e., if it is possible that startA > endA or startB > endB, then you also have to check that they are in order, so that means you have to add two additional validity rules:

(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:

(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,

(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:

(Max(StartA, StartB) <= Min(EndA, EndB)

But to implement Min() and Max(), you have to code, (using C ternary for terseness),:

(StartA > StartB? Start A: StartB) <= (EndA < EndB? EndA: EndB)

Prevent creation of overlapping date ranges

In general, if you have two ranges with S1..E1 and S2..E2 as the start and end values of the ranges, then you have an overlap if:

  • S1 < E2 and
  • S2 < E1

This is symmetric, which is good (and reassuring). You will need to decide carefully whether those 'less than' operations should be 'less than or equal'; both can make sense, depending on how you store your data (open vs closed vs half-open or open-closed and closed-open ranges, etc). You can see a diagram of the possibilities on the question 'Determine whether two date ranges overlap'.

In your context, the 'ShowDate' corresponds to the start date, and the 'HideDate' corresponds to the end date.

It sounds rather as if you want to do a 'conditional insert', too. In this case, assuming that your StoreID is 1001, the ShowDate is 2012-03-21 and the HideDate is 2012-03-28, then you might write:

INSERT INTO Calendar(StoreID, ShowDate, HideDate, ...)
SELECT 1001, '2012-03-21', '2012-03-28', ...
FROM Dual
WHERE NOT EXISTS(SELECT * FROM Calendar C2
WHERE C2.StoreID = 1001
AND C2.ShowDate < '2012-03-28'
AND '2012-03-21' < C2.HideDate
);

The values in the SELECT-list are those you want to add to the Calendar table. The sub-select means you get either 0 rows (because there is an overlap) or 1 row (because there is no overlap) with the data for your new value.

Get distinct consecutive date ranges from overlapping date ranges

The logic is:

  • at the beginning of an interval add its value to a cumulative sum
  • at the end of an interval substract its value from this sum
  • but in order to sweep the dateline, we'll have to collect al the (unique) date/time stamps, either start or stop.

So the point is: convert the data from a series of intervals to a series of (start/stop) events, and aggregate over these.


-- \i tmp.sql

create table coinsonperiod(
id serial,
startdate date,
enddate date,
coins integer
);
insert into coinsonperiod (startdate, enddate, coins) values
('2018-01-01','2018-01-31', 80)
, ('2018-01-07','2018-01-10', 10)
, ('2018-01-07','2018-01-31', 10)
, ('2018-01-11','2018-01-31', 5)
, ('2018-01-25','2018-01-27', 5)
, ('2018-02-02','2018-02-23', 100)
;

WITH changes AS (
SELECT startdate AS tickdate , coins
, 1 AS cover
FROM coinsonperiod
UNION ALL
-- add 1 day to convert to half-open intervals
SELECT 1+enddate AS tickdate, -1* coins
, -1 AS cover
FROM coinsonperiod
)
, sumchanges AS (
SELECT tickdate, SUM(coins) AS change, SUM(cover) AS cover
FROM changes
GROUP BY tickdate
)
, aggregated AS (
SELECT
tickdate AS startdate
, lead(tickdate) over www AS enddate
, sum(change) OVER www AS cash
-- number of covered intervals
, sum(cover) OVER www AS cover
FROM sumchanges
WINDOW www AS (ORDER BY tickdate)
)
-- substract one day from enddate to correct back to closed intervals
SELECT startdate, enddate-1 AS enddate, cash, cover
FROM aggregated
WHERE cover > 0
ORDER BY startdate
;

How to remove overlapping rows based on date and keep most recent in sql?

I removed the cursor solution here since it has low perforamnce
a solution without using Cursor is:

WITH ExcludedIds AS (
SELECT DISTINCT T2.episode_id
FROM table1 AS T
INNER JOIN table1 AS T2 ON T.episode_id != T2.episode_id
AND T.patient_num = T2.patient_num
AND T2.discharge_date BETWEEN DATEADD(DAY, -90, T.admit_date ) AND T.discharge_date)

SELECT T.episode_id, T.patient_num, T.admit_date, T.discharge_date
FROM table1 AS T
WHERE T.episode_id NOT IN (SELECT ExcludedIds.episode_id FROM ExcludedIds)

Thought understanding this solution is a bit hard.

Eliminate and reduce overlapping data ranges using SQL

It seems that SELECT with CTE needs to recursively merge undetermined number of rows. In that case I would prefer safe CURSOR based solution:

DECLARE @t TABLE
(
UserId int,
[Date] date,
StartTime time,
EndTime time
);
INSERT INTO @t VALUES
(1, '2011-02-03', '09:30:00', '09:35:00'),
(1, '2011-02-03', '09:31:00', '09:38:00'),
(1, '2011-02-03', '09:36:00', '09:41:00'),
(1, '2011-02-03', '09:40:00', '09:45:00'),
(1, '2011-02-03', '09:42:00', '09:43:00'),
(1, '2011-02-03', '10:03:00', '10:05:00'),
(2, '2011-02-03', '11:02:00', '11:05:00'),
(1, '2011-02-03', '12:00:00', '12:05:00'),
(1, '2011-02-03', '12:04:00', '12:06:00');

------------------
DECLARE @result TABLE
(
UserId int,
[Date] date,
StartTime time,
EndTime time
)

DECLARE cur CURSOR FOR
SELECT UserId, [Date], StartTime, EndTime
FROM @t
ORDER BY UserId, [Date], StartTime;

DECLARE @UserId int
DECLARE @Date date
DECLARE @StartTime time
DECLARE @EndTime time

DECLARE @LastUserId int
DECLARE @LastDate date
DECLARE @LastStartTime time
DECLARE @LastEndTime time

OPEN cur

FETCH NEXT FROM cur INTO @UserId, @Date, @StartTime, @EndTime
SET @LastUserId = @UserId
SET @LastDate = @Date
SET @LastStartTime = @StartTime
SET @LastEndTime = @EndTime
WHILE @@FETCH_STATUS = 0
BEGIN
IF @UserId = @LastUserId AND @Date = @LastDate AND @StartTime <= @LastEndTime
SET @LastEndTime = CASE WHEN @LastEndTime > @EndTime THEN @LastEndTime ELSE @EndTime END
ELSE
BEGIN
INSERT @result(UserId, [Date], StartTime, EndTime) VALUES (@LastUserId, @LastDate, @LastStartTime, @LastEndTime)
SET @LastUserId = @UserId
SET @LastDate = @Date
SET @LastStartTime = @StartTime
SET @LastEndTime = @EndTime
END

FETCH NEXT FROM cur INTO @UserId, @Date, @StartTime, @EndTime
END
INSERT @result(UserId, [Date], StartTime, EndTime) VALUES (@LastUserId, @LastDate, @LastStartTime, @LastEndTime)

CLOSE cur
DEALLOCATE cur

SELECT UserId,
[Date],
StartTime,
EndTime,
CAST(DATEADD(second,DATEDIFF(second,StartTime,EndTime),'2000-01-01') AS time) Diff
FROM @result

which returns

1   2011-02-03  09:30:00.0000000    09:45:00.0000000    00:15:00.0000000
1 2011-02-03 10:03:00.0000000 10:05:00.0000000 00:02:00.0000000
1 2011-02-03 12:00:00.0000000 12:06:00.0000000 00:06:00.0000000
2 2011-02-03 11:02:00.0000000 11:05:00.0000000 00:03:00.0000000


Related Topics



Leave a reply



Submit