Postgresql Query to Detect Overlapping Time Ranges

PostgreSQL query to detect overlapping time ranges

This can indeed be done using range types.

The following selects all those rows that do have overlapping ranges:

select f1.*
from my_features f1
where exists (select 1
from my_features f2
where tsrange(f2.begin_time, f2.end_time, '[]') && tsrange(f1.begin_time, f1.end_time, '[]')
and f2.feature_id = f1.feature_id
and f2.id <> f1.id);

When you change the condition to NOT EXISTS you'll find those that don't have any overlapping ranges.

SQLFiddle example: http://sqlfiddle.com/#!15/40b1e/1

tsrange(f2.begin_time, f2.end_time, '[]') creates a range that includes the upper and lower bounds. You can also create ranges that exclude either one or both.

More details can be found in the manual:

http://www.postgresql.org/docs/current/static/rangetypes.html#RANGETYPES-INCLUSIVITY

The && operator checks if the two ranges overlap: http://www.postgresql.org/docs/current/static/functions-range.html

(I just wish Oracle had something fancy like that...)

Postgres check for timestamp range overlap in table rows

I don't think there will be a really fast solution to that, as it does require comparing every row in the table with each and every other row in the table (or at least every other row in the specified range).

Assuming your table's primary key column is named id you could use Postgres' range function to check for overlapping rows:

with check_period (check_range) as (
values ( tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00') )
)
select id,
start_Time,
end_time,
exists (select *
from the_table t2
cross join check_perioud
where t2.id <> t1.id
and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
and tstzrange(t2.start_time, t2.start_time) <@ check_range
) has_overlapping_rows
from the_table t1
cross join check_period
where tstzrange(t1.start_time, t1.end_time) <@ check_range;

The CTE check_period is only there, so that the values for time period you want to analyze are not repeated. If you don't care about repeating them, you can remove it:

select id, 
start_Time,
end_time,
exists (select *
from the_table t2
where t2.id <> t1.id
and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
and tstzrange(t2.start_time, t2.start_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00')
) has_overlapping_rows
from the_table t1
where tstzrange(t1.start_time, t1.end_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00');

You should create an index on the timestamp range to make that quick:

create index on the_table( (tstzrange(start_time, end_time), id );

You can extend the above query to return a count of the overlapping rows rather than a true/false flag:

select id, 
start_Time,
end_time,
(select count(*)
from the_table t2
where t2.id <> t1.id
and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
and tstzrange(t2.start_time, t2.start_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00')
) has_overlapping_rows
from the_table t1
where tstzrange(t1.start_time, t1.end_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00');

However for rows having many overlapping rows, this will be slower, because the count(*) forces the database to inspect all overlapping rows. The exists() solution can stop at the first row found.

How to find overlapping date ranges in Postgresql between multiple rows?

It won't be fast, as every row has to be matched with every other row:

SELECT a.*, b.*
FROM mytable AS a
JOIN mytable AS b
ON daterange(a.valid_from, a.valid_to) && daterange(b.valid_from, b.valid_to)
WHERE (a.valid_from, a.valid_to) <= (b.valid_from, b.valid_to);

It might be better to have an exclusion constraint on the table that prevents such data from being added in the first place.

How to detect overlapping time ranges in PostgreSQL

  • First: The column-alias (select expression AS somename) is not usable from within its query, it is only visible from outside the query. You can solve this by wrapping it into a (subquery) xx or a view or a CTE

  • Second: don't repeat yourself: if you need to compute the same expression(s) twice, you could be doing too much ...


CREATE TEMP VIEW omg AS
SELECT fx.*,fx.time -(fx.timebuffer::text||' minute')::INTERVAL as startTime
,fx.time+(fx.timebuffer::text||' minute')::INTERVAL as endTime
, fx.locationtype
, fx.ID
-- ... maybe more columns and expressions ...
FROM userlocation fx
;

SELECT f1.startTime, f1.endTime
-- ... maybe more columns and expressions ...
FROM omg f1
WHERE EXISTS (
SELECT 1
FROM omg f2
WHERE tsrange(f2.startTime, f2.endTime, '()') && tsrange(f1.startTime, f1.endTime, '()')
AND f2.locationtype = f1.locationtype
AND f2.locationtype=1
AND f2."ID" <> f1."ID")
;
  • instead of a view you could use a CTE (the view probably performs better)
  • you could probably pull the tsrange into the view or CTE, too
  • I didn't check the logic

For completeness, the CTE version (which almost looks the same)

WITH omg AS (
SELECT fx.*,fx.time -(fx.timebuffer::text||' minute')::INTERVAL as startTime
,fx.time+(fx.timebuffer::text||' minute')::INTERVAL as endTime
, fx.locationtype
, fx.ID
-- ... maybe more columns and expressions ...
FROM userlocation fx
)
SELECT f1.startTime, f1.endTime
-- ... maybe more columns and expressions ...
FROM omg f1
WHERE EXISTS (
SELECT 1
FROM omg f2
WHERE tsrange(f2.startTime, f2.endTime, '()') && tsrange(f1.startTime, f1.endTime, '()')
AND f2.locationtype = f1.locationtype
AND f2.locationtype=1
AND f2."ID" <> f1."ID")
;

Find overlapping date ranges in PostgreSQL

Why not use between without the date part thing:

WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'

or something like that?

Querying for overlapping time ranges in SQLAlchemy and Postgres

It was much easier than I was making it. Again, this is in a Volunteer instance method.

new_shift = db.session.get(Shift, new_shift_id)

overlapping_shift = (
db.session.query(Shift, ShiftAssignment)
.join(ShiftAssignment)
.filter(ShiftAssignment.volunteer_id == self.id)
.filter(Shift.hours.overlaps(new_shift.hours))
.first()
)

if overlapping_shift:
print("overlap found")

Note that the query returns a (Shift, ShiftAssignment) tuple. We join the two appropriate tables and then filter twice, left with any overlapping shifts assigned to the current volunteer.

Determine Whether Two Date Ranges Overlap

(StartA <= EndB) and (EndA >= StartB)

Proof:

Let ConditionA Mean that DateRange A Completely After DateRange B

_                        |---- DateRange A ------|
|---Date Range B -----| _

(True if StartA > EndB)

Let ConditionB Mean that DateRange A is Completely Before DateRange B

|---- DateRange A -----|                        _ 
_ |---Date Range B ----|

(True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true -

(If one range is neither completely after the other,

nor completely before the other,
then they must overlap.)

Now one of De Morgan's laws says that:

Not (A Or B) <=> Not A And Not B

Which translates to: (StartA <= EndB) and (EndA >= StartB)


NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,

change the >= operators to >, and <= to <


NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:

{ endA-startA, endA - startB, endB-startA, endB - startB }

(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)


NOTE3. Thanks to @tomosius, a shorter version reads:

DateRangesOverlap = max(start1, start2) < min(end1, end2)

This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:

If start and end dates can be out of order, i.e., if it is possible that startA > endA or startB > endB, then you also have to check that they are in order, so that means you have to add two additional validity rules:

(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:

(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,

(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:

(Max(StartA, StartB) <= Min(EndA, EndB)

But to implement Min() and Max(), you have to code, (using C ternary for terseness),:

(StartA > StartB? Start A: StartB) <= (EndA < EndB? EndA: EndB)

Find and sum date ranges with overlapping records in postgresql

demo:db<>fiddle (uses the old data set with the overlapping A-B-part)

Disclaimer: This works for day intervals not for timestamps. The requirement for ts came later.

SELECT
s.acts,
s.sum,
MIN(a.start) as start,
MAX(a.end) as end
FROM (
SELECT DISTINCT ON (acts)
array_agg(name) as acts,
SUM(count)
FROM
activities, generate_series(start, "end", interval '1 day') gs
GROUP BY gs
HAVING cardinality(array_agg(name)) > 1
) s
JOIN activities a
ON a.name = ANY(s.acts)
GROUP BY s.acts, s.sum
  1. generate_series generates all dates between start and end. So every date an activity exists gets one row with the specific count
  2. Grouping all dates, aggregating all existing activities and sum of their counts
  3. HAVING filters out the dates where only one activity exist
  4. Because there are different days with the same activities we only need one representant: Filter all duplicates with DISTINCT ON
  5. Join this result against the original table to get the start and end. (note that "end" is a reserved word in Postgres, you should better find another column name!). It was more comfortable to lose them before but its possible to get these data within the subquery.
  6. Group this join to get the most early and latest date of each interval.

Here's a version for timestamps:

demo:db<>fiddle

WITH timeslots AS (
SELECT * FROM (
SELECT
tsrange(timepoint, lead(timepoint) OVER (ORDER BY timepoint)),
lead(timepoint) OVER (ORDER BY timepoint) -- 2
FROM (
SELECT
unnest(ARRAY[start, "end"]) as timepoint -- 1
FROM
activities
ORDER BY timepoint
) s
)s WHERE lead IS NOT NULL -- 3
)
SELECT
GREATEST(MAX(start), lower(tsrange)), -- 6
LEAST(MIN("end"), upper(tsrange)),
array_agg(name), -- 5
sum(count)
FROM
timeslots t
JOIN activities a
ON t.tsrange && tsrange(a.start, a.end) -- 4
GROUP BY tsrange
HAVING cardinality(array_agg(name)) > 1

The main idea is to identify possible time slots. So I take every known time (both start and end) and put them into a sorted list. So I can take the first tow known times (17:00 from start A and 18:00 from start B) and check which interval is in it. Then I check it for the 2nd and 3rd, then for 3rd an 4th and so on.

In the first timeslot only A fits. In the second from 18-19 also B is fitting. In the next slot 19-20 also C, from 20 to 20:30 A isn't fitting anymore, only B and C. The next one is 20:30-22 where only B fits, finally 22-23 D is added to B and last but not least only D fits into 23-23:30.

So I take this time list and join it agains the activities table where the intervals intersect. After that its only a grouping by time slot and sum up your count.

  1. this puts both ts of a row into one array whose elements are expanded into one row per element with unnest. So I get all times into one column which can be simply ordered
  2. using the lead window function allows to take the value of the next row into the current one. So I can create a timestamp range out of these both values with tsrange
  3. This filter is necessary because the last row has no "next value". This creates a NULL value which is interpreted by tsrange as infinity. So this would create an incredible wrong time slot. So we need to filter this row out.
  4. Join the time slots against the original table. The && operator checks if two range types overlap.
  5. Grouping by single time slots, aggregating the names and the count. Filter out the time slots with only one activity by using the HAVING clause
  6. A little bit tricky to get the right start and end points. So the start points are either the maximum of the activity start or the beginning of a time slot (which can be get using lower). E.g. Take the 20-20:30 slot: It begins 20h but neither B nor C has its starting point there. Similar the end time.


Related Topics



Leave a reply



Submit