Testing for overlapping arrays in Ruby
Since your subarrays are supposed to represent ranges, it might be a good idea to actually use an array of ranges instead of an array of array.
So your array becomes [100..300, 400..500]
.
For two ranges, we can easily define a method which checks whether two ranges overlap:
def overlap?(r1, r2)
r1.include?(r2.begin) || r2.include?(r1.begin)
end
Now to check whether a range r
overlaps with any range in your array of ranges, you just need to check whether overlap?(r, r2)
is true for any r2
in the array of ranges:
def any_overlap?(r, ranges)
ranges.any? do |r2|
overlap?(r, r2)
end
end
Which can be used like this:
any_overlap?(499..501, [100..300, 400..500])
#=> true
any_overlap?(599..601, [100..300, 400..500])
#=> false
Here any_overlap?
takes O(n)
time. So if you use any_overlap?
every time you add a range to the array, the whole thing will be O(n**2)
.
However there's a way to do what you want without checking each range:
You add all the ranges to the array without checking for overlap. Then you check whether any range in the array overlaps with any other. You can do this efficiently in O(n log n)
time by sorting the array on the beginning of each range and then testing whether two adjacent ranges overlap:
def any_overlap?(ranges)
ranges.sort_by(&:begin).each_cons(2).any? do |r1,r2|
overlap?(r1, r2)
end
end
Check if two ranges overlap in ruby
Two ranges overlap for a given range A when:
- range B starts within range A,
- range B ends within range A or
- range B starts before range A and ends after range A
Examples:
Range A |-----|
|-----| Case 1
|-----| Case 2
|-| Case 1 + 2
|---------| Case 3
Looking closer the rule is: Two ranges overlap when Range B starts before the range A ends and range B ends after the range A starts.
def ranges_overlap?(range_a, range_b)
range_b.begin <= range_a.end && range_a.begin <= range_b.end
end
How to merge overlapping structures with ranges in them
Structure of solution
I have defined
Format = Struct.new(:range, :attributes)
and instances of the class Format
such as
Format.new(10..20, :BAR)
Here @attribute
equals a symbol but it could be any Ruby object.
I will then construct and return an array of instances of Format
such as
Format.new(12..15, [:FOO, :BAR])
This is interpreted as meaning that the original instances of Format
for which @attributes
equal :FOO
and :BAR
have values of @range
that cover the interval 12..15
. Moreover, these instances of Format
(for which the value of @attributes
is an array) have the non-overlapping ranges asked for by the question. The order of the elements in the arrays (the values of @attributes
) is not specified.
The values (arrays) of @attributes
can then be manipulate as desired. For example, [3,5]
might be converted to 3|5
.
Code
def merge_formats(*formats)
fmod = formats.map { |e| Format.new(e.range, [e.attributes]) }.
sort_by { |e| e.range.begin }
a = []
while fmod.any?
b = []
while fmod.any? && (b.empty? || (fmod.first.range.begin == b.first.range.begin))
b << fmod.shift
end
next_end = b.min_by { |f| f.range.end }.range.end
next_end = [next_end, fmod.first.range.begin-1].min if fmod.any?
a << Format.new(b.first.range.begin..next_end, b.map { |f| f.attributes.first })
while b.any?
f = b.shift
fmod.unshift(Format.new(next_end+1..f.range.end, f.attributes)) if
f.range.end > next_end
end
end
a
end
Examples
Format = Struct.new(:range, :attributes)
f1 = Format.new(10..20, :FOO)
f2 = Format.new(12..16, :BAR)
merge_formats(f1, f2)
#=> [#<struct Format range=10..11, attributes=[:FOO]>,
# #<struct Format range=12..16, attributes=[:FOO, :BAR]>,
# #<struct Format range=17..20, attributes=[:FOO]>]
f1 = Format.new(12..16, :BAR)
f2 = Format.new(10..11, :FOO)
merge_formats(f1, f2)
#=> [#<struct Format range=10..11, attributes=[:FOO]>,
# #<struct Format range=12..16, attributes=[:BAR]>]
f1 = Format.new(12..16, :BAR)
f2 = Format.new(10..20, :FOO)
f3 = Format.new(14..24, :BAZ)
f4 = Format.new(15..18, :QUX)
merge_formats(f1, f2, f3, f4)
#=> [#<struct Format range=10..11, attributes=[:FOO]>,
# #<struct Format range=12..13, attributes=[:FOO, :BAR]>,
# #<struct Format range=14..14, attributes=[:BAR, :FOO, :BAZ]>,
# #<struct Format range=15..16, attributes=[:BAZ, :FOO, :BAR, :QUX]>,
# #<struct Format range=17..18, attributes=[:QUX, :FOO, :BAZ]>,
# #<struct Format range=19..20, attributes=[:BAZ, :FOO]>,
# #<struct Format range=21..24, attributes=[:BAZ]>]
Find and sum date ranges with overlapping records in postgresql
demo:db<>fiddle (uses the old data set with the overlapping A-B-part)
Disclaimer: This works for day intervals not for timestamps. The requirement for ts came later.
SELECT
s.acts,
s.sum,
MIN(a.start) as start,
MAX(a.end) as end
FROM (
SELECT DISTINCT ON (acts)
array_agg(name) as acts,
SUM(count)
FROM
activities, generate_series(start, "end", interval '1 day') gs
GROUP BY gs
HAVING cardinality(array_agg(name)) > 1
) s
JOIN activities a
ON a.name = ANY(s.acts)
GROUP BY s.acts, s.sum
generate_series
generates all dates between start and end. So every date an activity exists gets one row with the specificcount
- Grouping all dates, aggregating all existing activities and sum of their counts
HAVING
filters out the dates where only one activity exist- Because there are different days with the same activities we only need one representant: Filter all duplicates with
DISTINCT ON
- Join this result against the original table to get the start and end. (note that "end" is a reserved word in Postgres, you should better find another column name!). It was more comfortable to lose them before but its possible to get these data within the subquery.
- Group this join to get the most early and latest date of each interval.
Here's a version for timestamps:
demo:db<>fiddle
WITH timeslots AS (
SELECT * FROM (
SELECT
tsrange(timepoint, lead(timepoint) OVER (ORDER BY timepoint)),
lead(timepoint) OVER (ORDER BY timepoint) -- 2
FROM (
SELECT
unnest(ARRAY[start, "end"]) as timepoint -- 1
FROM
activities
ORDER BY timepoint
) s
)s WHERE lead IS NOT NULL -- 3
)
SELECT
GREATEST(MAX(start), lower(tsrange)), -- 6
LEAST(MIN("end"), upper(tsrange)),
array_agg(name), -- 5
sum(count)
FROM
timeslots t
JOIN activities a
ON t.tsrange && tsrange(a.start, a.end) -- 4
GROUP BY tsrange
HAVING cardinality(array_agg(name)) > 1
The main idea is to identify possible time slots. So I take every known time (both start and end) and put them into a sorted list. So I can take the first tow known times (17:00 from start A and 18:00 from start B) and check which interval is in it. Then I check it for the 2nd and 3rd, then for 3rd an 4th and so on.
In the first timeslot only A fits. In the second from 18-19 also B is fitting. In the next slot 19-20 also C, from 20 to 20:30 A isn't fitting anymore, only B and C. The next one is 20:30-22 where only B fits, finally 22-23 D is added to B and last but not least only D fits into 23-23:30.
So I take this time list and join it agains the activities table where the intervals intersect. After that its only a grouping by time slot and sum up your count.
- this puts both ts of a row into one array whose elements are expanded into one row per element with
unnest
. So I get all times into one column which can be simply ordered - using the lead window function allows to take the value of the next row into the current one. So I can create a timestamp range out of these both values with
tsrange
- This filter is necessary because the last row has no "next value". This creates a
NULL
value which is interpreted bytsrange
as infinity. So this would create an incredible wrong time slot. So we need to filter this row out. - Join the time slots against the original table. The
&&
operator checks if two range types overlap. - Grouping by single time slots, aggregating the names and the count. Filter out the time slots with only one activity by using the
HAVING
clause - A little bit tricky to get the right start and end points. So the start points are either the maximum of the activity start or the beginning of a time slot (which can be get using
lower
). E.g. Take the 20-20:30 slot: It begins 20h but neither B nor C has its starting point there. Similar the end time.
Determine Whether Two Date Ranges Overlap
(StartA <= EndB) and (EndA >= StartB)
Proof:
Let ConditionA Mean that DateRange A Completely After DateRange B
_ |---- DateRange A ------|
|---Date Range B -----| _
(True if StartA > EndB
)
Let ConditionB Mean that DateRange A is Completely Before DateRange B
|---- DateRange A -----| _
_ |---Date Range B ----|
(True if EndA < StartB
)
Then Overlap exists if Neither A Nor B is true -
(If one range is neither completely after the other,
nor completely before the other,
then they must overlap.)
Now one of De Morgan's laws says that:
Not (A Or B)
<=> Not A And Not B
Which translates to: (StartA <= EndB) and (EndA >= StartB)
NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,
change the >=
operators to >
, and <=
to <
NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:
{ endA-startA
, endA - startB
, endB-startA
, endB - startB
}
(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)
NOTE3. Thanks to @tomosius, a shorter version reads:DateRangesOverlap = max(start1, start2) < min(end1, end2)
This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:
If start and end dates can be out of order, i.e., if it is possible that startA > endA
or startB > endB
, then you also have to check that they are in order, so that means you have to add two additional validity rules:(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:(Max(StartA, StartB) <= Min(EndA, EndB)
But to implement Min()
and Max()
, you have to code, (using C ternary for terseness),:(StartA > StartB? Start A: StartB) <= (EndA < EndB? EndA: EndB)
Ruby: intersection between two ranges
require 'date'
class Range
def intersection(other)
return nil if (self.max < other.begin or other.max < self.begin)
[self.begin, other.begin].max..[self.max, other.max].min
end
alias_method :&, :intersection
end
p (Date.new(2011,1,1)..Date.new(2011,1,15)) & (Date.new(2011,1,10)..Date.new(2011,2,15))
#<Date: 2011-01-10 ((2455572j,0s,0n),+0s,2299161j)>..#<Date: 2011-01-15 ((2455577j,0s,0n),+0s,2299161j)>
Time span that can detect overlaps
You can use DateTime
ranges and ActiveSupport
's Range#overlaps?
method:
t1 = DateTime.parse("1-1-2014 12:00 AM") #=> Wed, 01 Jan 2014 10:00:00 +0000
d1 = (t1..t1+1.hour) #=> Wed, 01 Jan 2014 10:00:00 +0000..Wed, 01 Jan 2014 11:00:00 +0000
t2 = DateTime.parse("1-1-2014 10:30 AM") #=> Wed, 01 Jan 2014 10:30:00 +0000
d2 = (t2..t2+1.hour+30.minutes) #=> Wed, 01 Jan 2014 10:30:00 +0000..Wed, 01 Jan 2014 12:00:00 +0000
t3 = DateTime.parse("1-1-2014 12:00 PM") #=> Wed, 01 Jan 2014 12:00:00 +0000
d3 = (t3..t3+1.hour) #=> Wed, 01 Jan 2014 12:00:00 +0000..Wed, 01 Jan 2014 13:00:00 +0000
d1.overlaps? d2 #=> true
d1.overlaps? d3 #=> false
Rails Query Filtering by Comparing if Two Date Ranges Overlap
There are only two cases when the record is not to be displayed: when view_start
is greater then event_end
and when view_end
is less then event_start
. (assuming view dates are in correct order). Since we are looking for the negation of that, using DeMorgan's law:
if view_start < view_end
Model.where('event_start < ? AND event_end > ?', view_end, view_start)
else
Model.none
Related Topics
In Ruby or Rails, Why Is "Include" Sometimes Inside the Class and Sometimes Outside the Class
How to Integrate 'Premailer' with Rails
Activerecords Select(:Id).Collect VS. Pluck(:Id) Methods: Why Is Pure Ar "Pluck" Slower
Yielding in an Anonymous Block
Multi Level Block Method Is Generating Issue
Ruby Copy a Paperclip Attachment from One Model to Another
Getting Error "Exceeded Available Parameter Key Space"
Semifixed: Missing 'Secret_Key_Base' for 'Production' Environment
Extending a Class Method in a Module
Get Response Headers from Curb
Getaddrinfo Error with Mechanize
Trying to Set Up Amazon's S3 Bucket: 403 Forbidden Error & Setting Permissions
Rails: Get Values of Http Response
How to Know the Current Rake Task
Uninitialized Constant > Actioncable::Server::Configuration::Applicationcable