Efficient Date Range Overlap Calculation

Efficient date range overlap calculation?

  • Determine the latest of the two start dates and the earliest of the two end dates.
  • Compute the timedelta by subtracting them.
  • If the delta is positive, that is the number of days of overlap.

Here is an example calculation:

>>> from datetime import datetime
>>> from collections import namedtuple
>>> Range = namedtuple('Range', ['start', 'end'])

>>> r1 = Range(start=datetime(2012, 1, 15), end=datetime(2012, 5, 10))
>>> r2 = Range(start=datetime(2012, 3, 20), end=datetime(2012, 9, 15))
>>> latest_start = max(r1.start, r2.start)
>>> earliest_end = min(r1.end, r2.end)
>>> delta = (earliest_end - latest_start).days + 1
>>> overlap = max(0, delta)
>>> overlap
52

Determine Whether Two Date Ranges Overlap

(StartA <= EndB) and (EndA >= StartB)

Proof:

Let ConditionA Mean that DateRange A Completely After DateRange B

_                        |---- DateRange A ------|
|---Date Range B -----| _

(True if StartA > EndB)

Let ConditionB Mean that DateRange A is Completely Before DateRange B

|---- DateRange A -----|                        _ 
_ |---Date Range B ----|

(True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true -

(If one range is neither completely after the other,

nor completely before the other,
then they must overlap.)

Now one of De Morgan's laws says that:

Not (A Or B) <=> Not A And Not B

Which translates to: (StartA <= EndB) and (EndA >= StartB)


NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,

change the >= operators to >, and <= to <


NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:

{ endA-startA, endA - startB, endB-startA, endB - startB }

(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)


NOTE3. Thanks to @tomosius, a shorter version reads:

DateRangesOverlap = max(start1, start2) < min(end1, end2)

This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:

If start and end dates can be out of order, i.e., if it is possible that startA > endA or startB > endB, then you also have to check that they are in order, so that means you have to add two additional validity rules:

(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:

(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,

(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:

(Max(StartA, StartB) <= Min(EndA, EndB)

But to implement Min() and Max(), you have to code, (using C ternary for terseness),:

(StartA > StartB? Start A: StartB) <= (EndA < EndB? EndA: EndB)

NOTE4. Thanks to Carl for noticing this, but another answer shows an equivalent mathematical expression for this logical expression. Because the product of any two real numbers with opposite sign is negative and with the same sign it is positive, if you convert the datetimes to fractional numbers (and most DBMSs internally use numbers to represent datetimes), the above logical expression can also be evaluated using the following mathematical expression:

(EndA  - StartA) * (StartB - EndB) <= 0

overlapping dates between two date ranges in python

I would suggest generating the dates in each of both ranges, then selecting the intersection between the two set. A snipped doing so may look like this:

from datetime import date, timedelta


def f(d1, d2):
delta = d2 - d1
return set([d1 + timedelta(days=i) for i in range(delta.days + 1)])

range1 = [date(2016, 6, 1), date(2016, 6, 20)]
range2 = [date(2016, 6, 10), date(2016, 6, 13)]

print f(*range1) & f(*range2)

For performance purposes you can also cast d1 + timedelta(days=i) to str while generating the dates in a given range.

Efficiently find overlap of date-time ranges from 2 dataframes

Based on timeit tests, with 100 executions each, the namedtuple approach in the question averaged 15.7314 seconds on my machine, vs. an average of 1.4794 seconds with this approach:

# determine the duration of the events in df2, in seconds
duration = (df2.datetime_end - df2.datetime_start).dt.seconds.values

# create a numpy array with one timestamp for each second
# in which an event occurred
seconds_range = np.repeat(df2.datetime_start.values, duration) + \
np.concatenate(map(np.arange, duration)) * pd.Timedelta('1S')

df1.merge(pd.DataFrame({'datetime_start':seconds_range,
'catg':np.repeat(df2.catg, duration)}). \
groupby(['catg', pd.Grouper(key='datetime_start', freq='30min')]). \
size(). \
unstack(level=0). \
reset_index(),
how="left")

# datetime_end datetime_start a b c d
# 0 2016-09-11 06:30:00 2016-09-11 06:00:00 NaN NaN NaN NaN
# 1 2016-09-11 07:30:00 2016-09-11 07:00:00 NaN NaN NaN NaN
# 2 2016-09-11 08:00:00 2016-09-11 07:30:00 NaN NaN NaN NaN
# 3 2016-09-11 08:30:00 2016-09-11 08:00:00 NaN NaN NaN NaN
# 4 2016-09-11 09:00:00 2016-09-11 08:30:00 687.0 NaN NaN NaN
# 5 2016-09-11 09:30:00 2016-09-11 09:00:00 1800.0 NaN NaN NaN
# 6 2016-09-11 10:00:00 2016-09-11 09:30:00 1048.0 NaN NaN NaN
# 7 2016-09-11 11:00:00 2016-09-11 10:30:00 NaN NaN NaN NaN
# 8 2016-09-11 14:30:00 2016-09-11 14:00:00 NaN 463.0 NaN 701.0
# 9 2016-09-11 15:00:00 2016-09-11 14:30:00 NaN 220.0 NaN NaN
# 10 2016-09-11 15:30:00 2016-09-11 15:00:00 NaN 300.0 NaN 1277.0
# 11 2016-09-11 16:00:00 2016-09-11 15:30:00 1316.0 NaN NaN 89.0
# 12 2016-09-11 16:30:00 2016-09-11 16:00:00 564.0 680.0 NaN NaN
# 13 2016-09-11 17:00:00 2016-09-11 16:30:00 NaN 1654.0 NaN NaN
# 14 2016-09-11 17:30:00 2016-09-11 17:00:00 NaN 389.0 20.0 NaN

Efficiently finding overlap between many date ranges

You can pivot the table using the dates as index and the products as columns, then fill nan's with previous values, convert to daily frequency and look for rows with 0's in all columns.

ptable = (df.pivot(index='date', columns='product', values='stock')
.fillna(method='ffill').asfreq('D', method='ffill'))
cond = ptable.apply(lambda x: (x == 0).all(), axis='columns')
print(ptable.index[cond])

DatetimeIndex(['2016-01-10', '2016-01-11', '2016-01-12', '2016-01-13',
'2016-01-14'],
dtype='datetime64[ns]', name=u'date', freq='D')

Determine Whether Two Date Ranges Overlap, if yes how do I get the overlapped time with new start and end time

You can use this implementation of the DateRange :

public class DateRange : IEquatable<DateRange>
{
public DateTime Start { get; set; }
public DateTime End { get; set; }

public DateRange Intersect(DateRange d)
{
var s = (d.Start > this.Start) ? d.Start : this.Start; // Later Start
var e = (d.End < this.End) ? d.End : this.End; // Earlier ending

if (s < e)
return new DateRange() { Start = s, End = e };
else
return null;
}

public bool Contains(DateTime d)
{
return d >= Start && d <= End;
}

public bool Equals(DateRange obj)
{
return Start.Equals(obj.Start) && End.Equals(obj.End);
}
}

EDIT

var d1 =new DateRange() { Start = DateTime.Now, End = DateTime.Now.AddDays(1) };
var d2 =new DateRange() { Start = DateTime.Now.AddDays(-1), End = DateTime.Now };

Console.WriteLine(d1.Intersect(d2));

EDIT2

    public List<DateRange> Intersect2(DateRange d)
{
var s = (d.Start > this.Start) ? d.Start : this.Start; // Later Start
var e = (d.End < this.End) ? d.End : this.End; // Earlier ending

if (s < e)
return new List<DateRange>()
{
new DateRange() { Start = new DateTime(Math.Min(Start.Ticks, d.Start.Ticks)), End = s },
new DateRange() { Start = e, End = new DateTime(Math.Max(End.Ticks, d.End.Ticks)) }
};
else
return null;
}

Find date range overlap in python

You could just shift the to column and perform a direct subtraction of the datetimes.

df['overlap'] = (df['to'].shift()-df['from']) > timedelta(0)

Applying this while grouping by id may look like

df['overlap'] = (df.groupby('id')
.apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
.reset_index(level=0, drop=True))

Demo

>>> df
id from to
0 878 2006-01-01 2007-10-01
1 878 2007-10-02 2008-12-01
2 878 2008-12-02 2010-04-03
3 879 2010-04-04 2199-05-11
4 879 2016-05-12 2199-12-31

>>> df['overlap'] = (df.groupby('id')
.apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
.reset_index(level=0, drop=True))

>>> df
id from to overlap
0 878 2006-01-01 2007-10-01 False
1 878 2007-10-02 2008-12-01 False
2 878 2008-12-02 2010-04-03 False
3 879 2010-04-04 2199-05-11 False
4 879 2016-05-12 2199-12-31 True


Related Topics



Leave a reply



Submit