What Is a Simple and Efficient Way to Find Rows with Time-Interval Overlaps in SQL

What is a simple and efficient way to find rows with time-interval overlaps in SQL?


SELECT * 
FROM table1,table2
WHERE table2.start <= table1.end
AND (table2.end IS NULL OR table2.end >= table1.start)

Detect overlapping date ranges from the same table

If you already have entries for each day that should work, but if you don't the overhead is significant, and if that query is used often, if will affect performance.

If the data is in this format, you can detect overlaps using simple date arithmetic, because an overlap is simply one interval starting after a given interval, but before the given is finished, something like

select dr1.* from date_ranges dr1
inner join date_ranges dr2
on dr2.start > dr1.start -- start after dr1 is started
and dr2.start < dr1.end -- start before dr1 is finished

If you need special handling for interval that are wholly within another interval, or you need to merge intervals, i.e.

PKey  Start       End         Type
==== ===== === ====
01 01/01/2010 20/01/2010 S
02 15/01/2010 31/01/2010 S

yielding

Start       End         Type
===== === ====
01/01/2010 31/01/2010 S

you will need more complex calculation.

In my experience with this kind of problems, once you get how to do the calculation by hand, it's easy to transfer it into SQL :)

Finding overlapping intervals using SQL

Even simpler:

SELECT id, date_start, date_end 
FROM thetable
WHERE date_start <= p_date_end
AND date_end >= p_date_start

Determine Whether Two Date Ranges Overlap

(StartA <= EndB) and (EndA >= StartB)

Proof:

Let ConditionA Mean that DateRange A Completely After DateRange B

_                        |---- DateRange A ------|
|---Date Range B -----| _

(True if StartA > EndB)

Let ConditionB Mean that DateRange A is Completely Before DateRange B

|---- DateRange A -----|                        _ 
_ |---Date Range B ----|

(True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true -

(If one range is neither completely after the other,

nor completely before the other,
then they must overlap.)

Now one of De Morgan's laws says that:

Not (A Or B) <=> Not A And Not B

Which translates to: (StartA <= EndB) and (EndA >= StartB)


NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,

change the >= operators to >, and <= to <


NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:

{ endA-startA, endA - startB, endB-startA, endB - startB }

(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)


NOTE3. Thanks to @tomosius, a shorter version reads:

DateRangesOverlap = max(start1, start2) < min(end1, end2)

This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:

If start and end dates can be out of order, i.e., if it is possible that startA > endA or startB > endB, then you also have to check that they are in order, so that means you have to add two additional validity rules:

(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:

(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,

(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:

(Max(StartA, StartB) <= Min(EndA, EndB)

But to implement Min() and Max(), you have to code, (using C ternary for terseness),:

(StartA > StartB? Start A: StartB) <= (EndA < EndB? EndA: EndB)

Find overlapping (date/time) rows within one table


SELECT  m1.meetingID, m1.meetingStart, m1.meetingEnd, m2.meetingID
FROM t_meeting m1, t_meeting m2
WHERE (m2.meetingStart BETWEEN m1.meetingStart AND m1.meetingEnd
OR m2.meetingEnd BETWEEN m1.meetingStart AND m1.meetingEnd)
AND m1.meetingID <> m2.meetingID

This will select each pair twice.

If you want each pair to be selected just once, use:

SELECT  m1.meetingID, m1.meetingStart, m1.meetingEnd, m2.meetingID
FROM t_meeting m1, t_meeting m2
WHERE (m2.meetingStart BETWEEN m1.meetingStart AND m1.meetingEnd
OR m2.meetingEnd BETWEEN m1.meetingStart AND m1.meetingEnd)
AND m2.meetingID > m1.meetingID

Make sure you have indexes on meetingStart and meetingEnd for the query to work efficiently.

MySQL, however, will probably use INDEX MERGE to run this query, which is not very efficient in current implementation.

You also may try to use:

SELECT  m1.*, m2.*
FROM (
SELECT m1.meetingID AS mid1, m2.meetingID AS mid2
FROM t_meeting m1, t_meeting m2
WHERE m2.meetingStart BETWEEN m1.meetingStart AND m1.meetingEnd
AND m2.meetingID <> m1.meetingID
UNION
SELECT m1.meetingID, m2.meetingID
FROM t_meeting m1, t_meeting m2
WHERE m2.meetingEnd BETWEEN m1.meetingStart AND m1.meetingEnd
AND m2.meetingID <> m1.meetingID
) mo, t_meeting m1, t_meeting m2
WHERE m1.meetingID = mid1
AND m2.meetingID = mid2

, which is more complex but will most probably run a little bit faster.

Find max overlapping intervals in periods of time

This is a little complicated, but here is a query:

with t as (
select 1 as id, timestamp('2011-12-19 06:00:00') as startt, timestamp('2011-12-19 08:45:00') as endt union all
select 2 as id, timestamp('2011-12-19 06:15:00') as startt, timestamp('2011-12-19 06:30:00') as endt union all
select 3 as id, timestamp('2011-12-19 06:30:00') as startt, timestamp('2011-12-19 06:45:00') as endt union all
select 4 as id, timestamp('2011-12-19 06:40:00') as startt, timestamp('2011-12-19 07:15:00') as endt union all
select 5 as id, timestamp('2011-12-19 07:15:00') as startt, timestamp('2011-12-19 08:45:00') as endt union all
select 6 as id, timestamp('2011-12-19 07:30:00') as startt, timestamp('2011-12-19 07:50:00') as endt union all
select 7 as id, timestamp('2011-12-19 08:00:00') as startt, timestamp('2011-12-19 08:30:00') as endt union all
select 8 as id, timestamp('2011-12-19 08:00:00') as startt, timestamp('2011-12-19 08:15:00') as endt union all
select 9 as id, timestamp('2011-12-19 08:30:00') as startt, timestamp('2011-12-19 08:45:00') as endt
),
se as (
select id, startt as ts, 1 as inc
from t union all
select id, endt as ts, -1 as inc
from t union all
select null, ts, 0
from unnest(generate_timestamp_array(timestamp('2011-12-19 06:00:00'),
timestamp('2011-12-19 08:00:00'),
interval 1 hour)
) ts
),
p as (
select ts, (inc = 0) as col, sum(inc) as value_at,
countif(inc = 1) as num_starts,
sum(sum(inc)) over (order by ts, max(inc = 0) desc) as active_at,
sum(countif(inc = 0)) over (order by ts, max(inc = 0) desc) as period_grp
from se
group by 1, 2
)
select period_grp, min(ts) as period,
max(active_at) as max_in_period,
(array_agg(active_at order by ts limit 1)[ordinal(1)] +
sum(num_starts)
) as total
from p
group by period_grp;

The key idea is to split the starts and stops into separate rows with an "increment" of +1 or -1. This is then augmented with the hourly breaks that you want.

The code then does the following:

  • Calculate the cumulative sum of the increment to get the number of concurrent ids at each timestamp.
  • Calculates the "period" for each timestamp by taking a cumulative sum of the generated rows.

Then the two calculations you want are:

  1. The max is simply the max of the concurrent in a group by.
  2. The total is the concurrent at the beginning of the time period (not including any that start at the beginning of the time period) plus any starts during the time period.

How do I compare overlapping values within a row?

The correct check would look like this:

SELECT * FROM appts 
WHERE timeStart <='$timeEnd'
AND timeEnd >='$timeStart'
AND dayappt='$boatdate'

Other good explanations have been given but I'll go ahead and update it with an alternative explanation of how I visualize this myself. Most people are looking for each possible overlap, considering two time periods, they are trying to think of each combination of start and end that can make an appointment overlap. I think about it as when do two time periods not overlap which for some reason is easier for me.

Say the time period I am checking for is today, I want to find any time period that does not overlap today. There are really only two scenarios for that, either the time period starts after today (PeriodStart > EndOfToday) or the time period ends before today (PeriodEnd < StartOfToday).

Given that we havea simple test for not overlapping:

(PeriodStart > EndOfToday) OR (PeriodEnd < StartOfToday)

A quick flip around and you have a simple test for overlap:

(PeriodStart <= EndOfToday) AND (PeriodEnd >= StartOfToday)

-Shane



Related Topics



Leave a reply



Submit