Efficient date range overlap calculation?
- Determine the latest of the two start dates and the earliest of the two end dates.
- Compute the timedelta by subtracting them.
- If the delta is positive, that is the number of days of overlap.
Here is an example calculation:
>>> from datetime import datetime
>>> from collections import namedtuple
>>> Range = namedtuple('Range', ['start', 'end'])
>>> r1 = Range(start=datetime(2012, 1, 15), end=datetime(2012, 5, 10))
>>> r2 = Range(start=datetime(2012, 3, 20), end=datetime(2012, 9, 15))
>>> latest_start = max(r1.start, r2.start)
>>> earliest_end = min(r1.end, r2.end)
>>> delta = (earliest_end - latest_start).days + 1
>>> overlap = max(0, delta)
>>> overlap
52
Determine Whether Two Date Ranges Overlap
(StartA <= EndB) and (EndA >= StartB)
Proof:
Let ConditionA Mean that DateRange A Completely After DateRange B
_ |---- DateRange A ------|
|---Date Range B -----| _
(True if StartA > EndB
)
Let ConditionB Mean that DateRange A is Completely Before DateRange B
|---- DateRange A -----| _
_ |---Date Range B ----|
(True if EndA < StartB
)
Then Overlap exists if Neither A Nor B is true -
(If one range is neither completely after the other,
nor completely before the other,
then they must overlap.)
Now one of De Morgan's laws says that:
Not (A Or B)
<=> Not A And Not B
Which translates to: (StartA <= EndB) and (EndA >= StartB)
NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,
change the >=
operators to >
, and <=
to <
NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:
{ endA-startA
, endA - startB
, endB-startA
, endB - startB
}
(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)
NOTE3. Thanks to @tomosius, a shorter version reads:DateRangesOverlap = max(start1, start2) < min(end1, end2)
This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:
If start and end dates can be out of order, i.e., if it is possible that startA > endA
or startB > endB
, then you also have to check that they are in order, so that means you have to add two additional validity rules:(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:(Max(StartA, StartB) <= Min(EndA, EndB)
But to implement Min()
and Max()
, you have to code, (using C ternary for terseness),:(StartA > StartB? Start A: StartB) <= (EndA < EndB? EndA: EndB)
NOTE4. Thanks to Carl for noticing this, but another answer shows an equivalent mathematical expression for this logical expression. Because the product of any two real numbers with opposite sign is negative and with the same sign it is positive, if you convert the datetimes to fractional numbers (and most DBMSs internally use numbers to represent datetimes), the above logical expression can also be evaluated using the following mathematical expression:
(EndA - StartA) * (StartB - EndB) <= 0
overlapping dates between two date ranges in python
I would suggest generating the dates in each of both ranges, then selecting the intersection between the two set. A snipped doing so may look like this:
from datetime import date, timedelta
def f(d1, d2):
delta = d2 - d1
return set([d1 + timedelta(days=i) for i in range(delta.days + 1)])
range1 = [date(2016, 6, 1), date(2016, 6, 20)]
range2 = [date(2016, 6, 10), date(2016, 6, 13)]
print f(*range1) & f(*range2)
For performance purposes you can also cast d1 + timedelta(days=i)
to str
while generating the dates in a given range.
Efficiently find overlap of date-time ranges from 2 dataframes
Based on timeit
tests, with 100 executions each, the namedtuple
approach in the question averaged 15.7314
seconds on my machine, vs. an average of 1.4794
seconds with this approach:
# determine the duration of the events in df2, in seconds
duration = (df2.datetime_end - df2.datetime_start).dt.seconds.values
# create a numpy array with one timestamp for each second
# in which an event occurred
seconds_range = np.repeat(df2.datetime_start.values, duration) + \
np.concatenate(map(np.arange, duration)) * pd.Timedelta('1S')
df1.merge(pd.DataFrame({'datetime_start':seconds_range,
'catg':np.repeat(df2.catg, duration)}). \
groupby(['catg', pd.Grouper(key='datetime_start', freq='30min')]). \
size(). \
unstack(level=0). \
reset_index(),
how="left")
# datetime_end datetime_start a b c d
# 0 2016-09-11 06:30:00 2016-09-11 06:00:00 NaN NaN NaN NaN
# 1 2016-09-11 07:30:00 2016-09-11 07:00:00 NaN NaN NaN NaN
# 2 2016-09-11 08:00:00 2016-09-11 07:30:00 NaN NaN NaN NaN
# 3 2016-09-11 08:30:00 2016-09-11 08:00:00 NaN NaN NaN NaN
# 4 2016-09-11 09:00:00 2016-09-11 08:30:00 687.0 NaN NaN NaN
# 5 2016-09-11 09:30:00 2016-09-11 09:00:00 1800.0 NaN NaN NaN
# 6 2016-09-11 10:00:00 2016-09-11 09:30:00 1048.0 NaN NaN NaN
# 7 2016-09-11 11:00:00 2016-09-11 10:30:00 NaN NaN NaN NaN
# 8 2016-09-11 14:30:00 2016-09-11 14:00:00 NaN 463.0 NaN 701.0
# 9 2016-09-11 15:00:00 2016-09-11 14:30:00 NaN 220.0 NaN NaN
# 10 2016-09-11 15:30:00 2016-09-11 15:00:00 NaN 300.0 NaN 1277.0
# 11 2016-09-11 16:00:00 2016-09-11 15:30:00 1316.0 NaN NaN 89.0
# 12 2016-09-11 16:30:00 2016-09-11 16:00:00 564.0 680.0 NaN NaN
# 13 2016-09-11 17:00:00 2016-09-11 16:30:00 NaN 1654.0 NaN NaN
# 14 2016-09-11 17:30:00 2016-09-11 17:00:00 NaN 389.0 20.0 NaN
Efficiently finding overlap between many date ranges
You can pivot the table using the dates as index and the products as columns, then fill nan's with previous values, convert to daily frequency and look for rows with 0's in all columns.
ptable = (df.pivot(index='date', columns='product', values='stock')
.fillna(method='ffill').asfreq('D', method='ffill'))
cond = ptable.apply(lambda x: (x == 0).all(), axis='columns')
print(ptable.index[cond])
DatetimeIndex(['2016-01-10', '2016-01-11', '2016-01-12', '2016-01-13',
'2016-01-14'],
dtype='datetime64[ns]', name=u'date', freq='D')
Determine Whether Two Date Ranges Overlap, if yes how do I get the overlapped time with new start and end time
You can use this implementation of the DateRange :
public class DateRange : IEquatable<DateRange>
{
public DateTime Start { get; set; }
public DateTime End { get; set; }
public DateRange Intersect(DateRange d)
{
var s = (d.Start > this.Start) ? d.Start : this.Start; // Later Start
var e = (d.End < this.End) ? d.End : this.End; // Earlier ending
if (s < e)
return new DateRange() { Start = s, End = e };
else
return null;
}
public bool Contains(DateTime d)
{
return d >= Start && d <= End;
}
public bool Equals(DateRange obj)
{
return Start.Equals(obj.Start) && End.Equals(obj.End);
}
}
EDIT
var d1 =new DateRange() { Start = DateTime.Now, End = DateTime.Now.AddDays(1) };
var d2 =new DateRange() { Start = DateTime.Now.AddDays(-1), End = DateTime.Now };
Console.WriteLine(d1.Intersect(d2));
EDIT2
public List<DateRange> Intersect2(DateRange d)
{
var s = (d.Start > this.Start) ? d.Start : this.Start; // Later Start
var e = (d.End < this.End) ? d.End : this.End; // Earlier ending
if (s < e)
return new List<DateRange>()
{
new DateRange() { Start = new DateTime(Math.Min(Start.Ticks, d.Start.Ticks)), End = s },
new DateRange() { Start = e, End = new DateTime(Math.Max(End.Ticks, d.End.Ticks)) }
};
else
return null;
}
Find date range overlap in python
You could just shift the to
column and perform a direct subtraction of the datetimes.
df['overlap'] = (df['to'].shift()-df['from']) > timedelta(0)
Applying this while grouping by id
may look like
df['overlap'] = (df.groupby('id')
.apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
.reset_index(level=0, drop=True))
Demo
>>> df
id from to
0 878 2006-01-01 2007-10-01
1 878 2007-10-02 2008-12-01
2 878 2008-12-02 2010-04-03
3 879 2010-04-04 2199-05-11
4 879 2016-05-12 2199-12-31
>>> df['overlap'] = (df.groupby('id')
.apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
.reset_index(level=0, drop=True))
>>> df
id from to overlap
0 878 2006-01-01 2007-10-01 False
1 878 2007-10-02 2008-12-01 False
2 878 2008-12-02 2010-04-03 False
3 879 2010-04-04 2199-05-11 False
4 879 2016-05-12 2199-12-31 True
Related Topics
Converting Two Lists into a Matrix
Finding the Maximum Number of Columns in a File or CSV Using Python
What Is the Correct Format to Write Float Value to File in Python
Strip White Spaces from CSV File
How to Remove a Pandas Dataframe from Another Dataframe
Python: Element Is Not Attached to the Page Document
Only Reading First N Rows of CSV File With CSV Reader in Python
Regex That Matches a Number With Commas for Every Three Digits
Splitting Dictionary Items into Smaller Dictionaries Based on Condition
How to Track the Number of Times a Function Is Called
Robot Framework Using Python, Key Press Without Selecting Any Button or Element in the Page
Replacing Pandas or Numpy Nan With a None to Use With Mysqldb
How to Write a Lambda Function That Is Conditional on Two Variables (Columns) in Python
How to Install Tesseract for Python on Anaconda
How to Split a CSV File Row to Columns in Python
How to Skip Blank Line While Reading CSV File Using Python