Check for Consecutive Dates Within a Set and Return as Range

Check for consecutive dates within a set and return as range

Bit of a quick answer so sorry about the lack of implementation but assuming you are using 5.3 and the dates are ordered chronologically, you could convert each date to a DateTime object (if they aren't already) and then iterate over the array using DateTime::diff() to generate a DateInterval object which you could use to compare the current date in the iteration with the last. You could group your consecutive dates into sub arrays and use shift() and pop() to get the first and last days in that sub array.

EDIT

I had a think about this. Pretty rough and ready implementation follows, but it should work:

// assuming a chronologically
// ordered array of DateTime objects

$dates = array(
new DateTime('2010-12-30'),
new DateTime('2011-01-01'),
new DateTime('2011-01-02'),
new DateTime('2011-01-03'),
new DateTime('2011-01-06'),
new DateTime('2011-01-07'),
new DateTime('2011-01-10'),
);

// process the array

$lastDate = null;
$ranges = array();
$currentRange = array();

foreach ($dates as $date) {

if (null === $lastDate) {
$currentRange[] = $date;
} else {

// get the DateInterval object
$interval = $date->diff($lastDate);

// DateInterval has properties for
// days, weeks. months etc. You should
// implement some more robust conditions here to
// make sure all you're not getting false matches
// for diffs like a month and a day, a year and
// a day and so on...

if ($interval->days === 1) {
// add this date to the current range
$currentRange[] = $date;
} else {
// store the old range and start anew
$ranges[] = $currentRange;
$currentRange = array($date);
}
}

// end of iteration...
// this date is now the last date
$lastDate = $date;
}

// messy...
$ranges[] = $currentRange;

// print dates

foreach ($ranges as $range) {

// there'll always be one array element, so
// shift that off and create a string from the date object
$startDate = array_shift($range);
$str = sprintf('%s', $startDate->format('D j M'));

// if there are still elements in $range
// then this is a range. pop off the last
// element, do the same as above and concatenate
if (count($range)) {
$endDate = array_pop($range);
$str .= sprintf(' to %s', $endDate->format('D j M'));
}

echo "<p>$str</p>";
}

Outputs:

Thu 30 Dec
Sat 1 Jan to Mon 3 Jan
Thu 6 Jan to Fri 7 Jan
Mon 10 Jan

Find non consecutive date ranges

You can detect gaps with LAG() and mark them. Then, it's easy to filter out the rows. For example:

select *
from (
select *,
case when dateadd(day, -1, start_date) >
lag(end_date) over(partition by client_id order by start_date)
then 1 else 0 end as i
from t
) x
where i = 1

Or simpler...

select *
from (
select *,
lag(end_date) over(partition by client_id order by start_date) as prev_end
from t
) x
where dateadd(day, -1, start_date) > prev_end

SQL Counting Consecutive Days in Date Ranges

This is a gaps-and-islands problem. One option is to use lag() and a window sum() to build groups of adjacent records. You can then aggregate by group and count the number of consecutive days, and finally filter on the greatest streak by name:

select name, max(consecutive_days) consecutive_days
from (
select name, datediff(day, min(start_date), max(end_date)) + 1 consecutive_days
from (
select t.*,
sum(case when start_date = dateadd(day, 1, lag_end_date) then 0 else 1 end) over(partition by name order by start_date) grp
from (
select t.*,
lag(end_date) over(partition by name order by start_date) lag_end_date
from mytable t
) t
) t
group by name, grp
) t
group by name

Demo on DB Fiddle:


name | consecutive_days
:----- | ---------------:
Jenny | 3
Johnny | 9

Group consecutive dates in R

base R

input <- input[order(input$DATE),]
input$grp <- ave(as.integer(input$DATE), input[-4], FUN = function(z) cumsum(c(TRUE, diff(z) > 1)))
input
# FID PID SETTING DATE grp
# 1 1 100001 ST 2021-01-01 1
# 2 1 100001 ST 2021-01-02 1
# 7 2 200001 AB 2021-01-02 1
# 3 1 100001 ST 2021-01-03 1
# 8 2 200001 AB 2021-01-03 1
# 4 1 100002 AB 2021-01-04 1
# 9 2 200001 AB 2021-01-04 1
# 10 2 200002 TK 2021-01-05 1
# 5 1 100001 ST 2021-01-11 1
# 6 1 100001 ST 2021-01-12 1

out <- aggregate(DATE ~ FID + PID + SETTING + grp, data = input,
FUN = function(z) setNames(range(z), c("START","END")))
out <- do.call(data.frame, out)
out[,5:6] <- lapply(out[,5:6], as.Date, origin = "1970-01-01")
out
# FID PID SETTING grp DATE.START DATE.END
# 1 1 100002 AB 1 2021-01-04 2021-01-04
# 2 2 200001 AB 1 2021-01-02 2021-01-04
# 3 1 100001 ST 1 2021-01-01 2021-01-03
# 4 2 200002 TK 1 2021-01-05 2021-01-05
# 5 1 100001 ST 2 2021-01-11 2021-01-12

Walk-through:

  • the ease of cumsum and diff is accomplished assuming that the dates are always ordered; it is not important (here) that the other grouping variables may be misordered;
  • ave(..) assigns groups of non-consecutive (diff over 1) dates, which we use in the next step;
  • aggregate calculates the range within each group, using your three variables plus our new grp grouping variable; each z in the anonymous function is a contiguous vector of dates, so range gives us that start/end dates;
  • unfortunately, aggregate is assigning a matrix as the fifth column instead of two separate columns, so do.call(data.frame, out) fixes that;
  • unfortunately, most base R aggregating functions tend to strip the Date (and POSIXt) class from the vectors, so we need to use as.Date to heal that.

dplyr

library(dplyr)
input %>%
arrange(DATE) %>%
group_by(FID, PID, SETTING) %>%
mutate(grp = cumsum(c(TRUE, diff(DATE) > 1))) %>%
group_by(FID, PID, SETTING, grp) %>%
summarize(START = min(DATE), END = max(DATE)) %>%
ungroup()
# # A tibble: 5 x 6
# FID PID SETTING grp START END
# <int> <int> <chr> <int> <date> <date>
# 1 1 100001 " ST" 1 2021-01-01 2021-01-03
# 2 1 100001 " ST" 2 2021-01-11 2021-01-12
# 3 1 100002 " AB" 1 2021-01-04 2021-01-04
# 4 2 200001 " AB" 1 2021-01-02 2021-01-04
# 5 2 200002 " TK" 1 2021-01-05 2021-01-05

data.table

library(data.table)
inputDT <- as.data.table(input)
setorder(inputDT, DATE)
inputDT[, grp := cumsum(c(TRUE, diff(DATE) > 1)), by = .(FID, PID, SETTING)
][, .(START = min(DATE), END = max(DATE)), by = .(FID, PID, SETTING, grp)
][]
# FID PID SETTING grp START END
# <int> <int> <char> <int> <Date> <Date>
# 1: 1 100001 ST 1 2021-01-01 2021-01-03
# 2: 2 200001 AB 1 2021-01-02 2021-01-04
# 3: 1 100002 AB 1 2021-01-04 2021-01-04
# 4: 2 200002 TK 1 2021-01-05 2021-01-05
# 5: 1 100001 ST 2 2021-01-11 2021-01-12

Grouping consecutive dates in an array together in PHP

This piece of code groups consecutive dates together and understands daylight saving.

Array of numbers

$dates = array
(
strtotime('2012-10-01'),

strtotime('2012-10-03'),
strtotime('2012-10-04'),
strtotime('2012-10-05'),
strtotime('2012-10-06'),
strtotime('2012-10-07'),

strtotime('2012-10-10'),
strtotime('2012-10-11'),
strtotime('2012-10-12'),
strtotime('2012-10-13'),
strtotime('2012-10-14'),
strtotime('2012-10-15'),
strtotime('2012-10-16'),
strtotime('2012-10-17'),
strtotime('2012-10-18'),
strtotime('2012-10-19'),
strtotime('2012-10-20'),

strtotime('2012-10-23'),
strtotime('2012-10-24'),
strtotime('2012-10-25'),
strtotime('2012-10-26'),
strtotime('2012-10-29'),
strtotime('2012-10-30'),
strtotime('2012-10-31'),
strtotime('2012-11-01'),
strtotime('2012-11-02'),

strtotime('2012-11-04')
);

Code:

$conseq = array(); 
$ii = 0;
$max = count($dates);

for($i = 0; $i < count($dates); $i++) {
$conseq[$ii][] = date('Y-m-d',$dates[$i]);

if($i + 1 < $max) {
$dif = $dates[$i + 1] - $dates[$i];
if($dif >= 90000) {
$ii++;
}
}
}

Outputs:

array
0 =>
array
0 => string '2012-10-01' (length=10)
1 =>
array
0 => string '2012-10-03' (length=10)
1 => string '2012-10-04' (length=10)
2 => string '2012-10-05' (length=10)
3 => string '2012-10-06' (length=10)
4 => string '2012-10-07' (length=10)
2 =>
array
0 => string '2012-10-10' (length=10)
1 => string '2012-10-11' (length=10)
2 => string '2012-10-12' (length=10)
3 => string '2012-10-13' (length=10)
4 => string '2012-10-14' (length=10)
5 => string '2012-10-15' (length=10)
6 => string '2012-10-16' (length=10)
7 => string '2012-10-17' (length=10)
8 => string '2012-10-18' (length=10)
9 => string '2012-10-19' (length=10)
10 => string '2012-10-20' (length=10)
3 =>
array
0 => string '2012-10-23' (length=10)
1 => string '2012-10-24' (length=10)
2 => string '2012-10-25' (length=10)
3 => string '2012-10-26' (length=10)
4 =>
array
0 => string '2012-10-29' (length=10)
1 => string '2012-10-30' (length=10)
2 => string '2012-10-31' (length=10)
3 => string '2012-11-01' (length=10)
4 => string '2012-11-02' (length=10)
5 =>
array
0 => string '2012-11-04' (length=10)

Check if dates are continuous with PHP

Using the DateTime::diff function, you can get the difference in days. Then loop over the dates, and if the difference isn't one day, it's not continuous.

$dates = "2017-01-08, 2017-01-09, 2017-01-10";
var_dump(check_continuous_dates($dates)); // true

$dates = "2017-01-08, 2017-01-12, 2017-01-13";
var_dump(check_continuous_dates($dates)); // false

function check_continuous_dates($date) {
$date = explode(", ", $date); // Convert it to an array we can loop
$previous = new DateTime($date[0]); // Set the "previous" value
unset($date[0]); // Unset the value we just set to $previous, so we don't loop it twice

foreach ($date as $v) { // Loop the array
$current = new DateTime($v);
$diff = $current->diff($previous);

// If the difference is exactly 1 day, it's continuous
if ($diff->days == 1) {
$previous = new DateTime($v);
} else {
return false;
}
}
return true;
}

Demo

References

  • http://php.net/manual/en/datetime.diff.php

How to detect if dates are consecutive in Python?

Rather than rolling your own consecutive function you can simply convert date objects to integers using the .toordinal() method of datetime objects. The difference between the maximum and minimum value of the set of ordinal dates is one more than the length of the set:

from datetime import datetime

date_strs = ['07-06-2010', '06-24-2010', '07-05-2010', '06-25-2010']
# date_strs = ['02-29-2012', '02-28-2012', '03-01-2012']
# date_strs = ['01-01-2000']
dates = [datetime.strptime(d, "%m-%d-%Y") for d in date_strs]

date_ints = set([d.toordinal() for d in dates])

if len(date_ints) == 1:
print "unique"
elif max(date_ints) - min(date_ints) == len(date_ints) - 1:
print "consecutive"
else:
print "not consecutive"

R: Converting consecutive dates from a single column into a 2-column range

Here I am checking whether the difference with the previous date (diff(Date)) is not 1. If so, start a new group (taking the cumsum of this indicator means g will increase by 1 whenever it's TRUE).

library(data.table)
setDT(df)

df[, Date := as.Date(Date, format = '%m/%d/%Y')]

df[, .(start = min(Date), end = max(Date)),
by = .(Person_ID, Department, g = cumsum(c(0, diff(Date)) != 1))]

# Person_ID Department g start end
# 1: 351581 GH 1 2019-12-01 2019-12-03
# 2: 351581 FR 2 2019-12-02 2019-12-02
# 3: 598168 GH 3 2019-12-16 2019-12-16
# 4: 351581 JE 4 2019-12-08 2019-12-09
# 5: 615418 AB 5 2019-12-20 2019-12-20
# 6: 615418 AB 6 2019-12-22 2019-12-22

If your data is not already ordered by date within (Person_ID, Department) groups, you can add order(Date) to the i part of df[i, j, k] i.e. change the code above to

df[order(Date), .(start = min(Date), end = max(Date)),
by = .(Person_ID, Department, g = cumsum(c(0, diff(Date)) != 1))]

Note that for this updated example, this is not the same as grouping by Person_ID and Department

df[, .(start = min(Date), end = max(Date)),
by = .(Person_ID, Department)]

# Person_ID Department start end
# 1: 351581 GH 2019-12-01 2019-12-03
# 2: 351581 FR 2019-12-02 2019-12-02
# 3: 598168 GH 2019-12-16 2019-12-16
# 4: 351581 JE 2019-12-08 2019-12-09
# 5: 615418 AB 2019-12-20 2019-12-22

Data used:

df <- fread('
Person_ID Department Date
351581 GH 12/1/2019
351581 GH 12/2/2019
351581 GH 12/3/2019
351581 FR 12/2/2019
598168 GH 12/16/2019
351581 JE 12/8/2019
351581 JE 12/9/2019
615418 AB 12/20/2019
615418 AB 12/22/2019
')

Find group of consecutive dates in Pandas DataFrame

It seems like you need two boolean masks: one to determine the breaks between groups, and one to determine which dates are in a group in the first place.

There's also one tricky part that can be fleshed out by example. Notice that df below contains an added row that doesn't have any consecutive dates before or after it.

>>> df
DateAnalyzed Val
1 2018-03-18 0.470253
2 2018-03-19 0.470253
3 2018-03-20 0.470253
4 2017-01-20 0.485949 # < watch out for this
5 2018-09-25 0.467729
6 2018-09-26 0.467729
7 2018-09-27 0.467729

>>> df.dtypes
DateAnalyzed datetime64[ns]
Val float64
dtype: object

The answer below assumes that you want to ignore 2017-01-20 completely, without processing it. (See end of answer for a solution if you do want to process this date.)

First:

>>> dt = df['DateAnalyzed']
>>> day = pd.Timedelta('1d')
>>> in_block = ((dt - dt.shift(-1)).abs() == day) | (dt.diff() == day)
>>> in_block
1 True
2 True
3 True
4 False
5 True
6 True
7 True
Name: DateAnalyzed, dtype: bool

Now, in_block will tell you which dates are in a "consecutive" block, but it won't tell you to which groups each date belongs.

The next step is to derive the groupings themselves:

>>> filt = df.loc[in_block]
>>> breaks = filt['DateAnalyzed'].diff() != day
>>> groups = breaks.cumsum()
>>> groups
1 1
2 1
3 1
5 2
6 2
7 2
Name: DateAnalyzed, dtype: int64

Then you can call df.groupby(groups) with your operation of choice.

>>> for _, frame in filt.groupby(groups):
... print(frame, end='\n\n')
...
DateAnalyzed Val
1 2018-03-18 0.470253
2 2018-03-19 0.470253
3 2018-03-20 0.470253

DateAnalyzed Val
5 2018-09-25 0.467729
6 2018-09-26 0.467729
7 2018-09-27 0.467729

To incorporate this back into df, assign to it and the isolated dates will be NaN:

>>> df['groups'] = groups
>>> df
DateAnalyzed Val groups
1 2018-03-18 0.470253 1.0
2 2018-03-19 0.470253 1.0
3 2018-03-20 0.470253 1.0
4 2017-01-20 0.485949 NaN
5 2018-09-25 0.467729 2.0
6 2018-09-26 0.467729 2.0
7 2018-09-27 0.467729 2.0

If you do want to include the "lone" date, things become a bit more straightforward:

dt = df['DateAnalyzed']
day = pd.Timedelta('1d')
breaks = dt.diff() != day
groups = breaks.cumsum()

Detect consecutive dates ranges using SQL

No joins or recursive CTEs needed. The standard gaps-and-island solution is to group by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group.

WITH t AS (
SELECT InfoDate d,ROW_NUMBER() OVER(ORDER BY InfoDate) i
FROM @d
GROUP BY InfoDate
)
SELECT MIN(d),MAX(d)
FROM t
GROUP BY DATEDIFF(day,i,d)


Related Topics



Leave a reply



Submit