Check for Consecutive Dates Within a Set and Return as Range

Check for consecutive dates within a set and return as range

Bit of a quick answer so sorry about the lack of implementation but assuming you are using 5.3 and the dates are ordered chronologically, you could convert each date to a DateTime object (if they aren't already) and then iterate over the array using DateTime::diff() to generate a DateInterval object which you could use to compare the current date in the iteration with the last. You could group your consecutive dates into sub arrays and use shift() and pop() to get the first and last days in that sub array.

EDIT

I had a think about this. Pretty rough and ready implementation follows, but it should work:

// assuming a chronologically
// ordered array of DateTime objects 

$dates = array(
    new DateTime('2010-12-30'), 
    new DateTime('2011-01-01'), 
    new DateTime('2011-01-02'), 
    new DateTime('2011-01-03'), 
    new DateTime('2011-01-06'), 
    new DateTime('2011-01-07'), 
    new DateTime('2011-01-10'),
);

// process the array

$lastDate = null;
$ranges = array();
$currentRange = array();

foreach ($dates as $date) {    

    if (null === $lastDate) {
        $currentRange[] = $date;
    } else {

        // get the DateInterval object
        $interval = $date->diff($lastDate);

        // DateInterval has properties for 
        // days, weeks. months etc. You should 
        // implement some more robust conditions here to 
        // make sure all you're not getting false matches
        // for diffs like a month and a day, a year and 
        // a day and so on...

        if ($interval->days === 1) {
            // add this date to the current range
            $currentRange[] = $date;    
        } else {
            // store the old range and start anew
            $ranges[] = $currentRange;
            $currentRange = array($date);
        }
    }

    // end of iteration... 
    // this date is now the last date     
    $lastDate = $date;
}

// messy... 
$ranges[] = $currentRange;

// print dates

foreach ($ranges as $range) {

    // there'll always be one array element, so 
    // shift that off and create a string from the date object 
    $startDate = array_shift($range);
    $str = sprintf('%s', $startDate->format('D j M'));

    // if there are still elements in $range
    // then this is a range. pop off the last 
    // element, do the same as above and concatenate
    if (count($range)) {
        $endDate = array_pop($range);
        $str .= sprintf(' to %s', $endDate->format('D j M'));
    }

    echo "<p>$str</p>";
}

Outputs:

Thu 30 Dec
Sat 1 Jan to Mon 3 Jan
Thu 6 Jan to Fri 7 Jan
Mon 10 Jan

Find non consecutive date ranges

You can detect gaps with LAG() and mark them. Then, it's easy to filter out the rows. For example:

select *
from (
  select *,
    case when dateadd(day, -1, start_date) >
       lag(end_date) over(partition by client_id order by start_date) 
    then 1 else 0 end as i
  from t
) x
where i = 1

Or simpler...

select *
from (
  select *,
    lag(end_date) over(partition by client_id order by start_date) as prev_end
  from t
) x
where dateadd(day, -1, start_date) > prev_end

SQL Counting Consecutive Days in Date Ranges

This is a gaps-and-islands problem. One option is to use lag() and a window sum() to build groups of adjacent records. You can then aggregate by group and count the number of consecutive days, and finally filter on the greatest streak by name:

select name, max(consecutive_days) consecutive_days
from (
    select name, datediff(day, min(start_date), max(end_date)) + 1 consecutive_days
    from (
        select t.*, 
            sum(case when start_date = dateadd(day, 1, lag_end_date) then 0 else 1 end) over(partition by name order by start_date) grp
        from (
            select t.*, 
                lag(end_date) over(partition by name order by start_date) lag_end_date
            from mytable t
        ) t
    ) t
    group by name, grp
) t
group by name

Demo on DB Fiddle:


name   | consecutive_days
:----- | ---------------:
Jenny  |                3
Johnny |                9

Group consecutive dates in R

base R

input <- input[order(input$DATE),]
input$grp <- ave(as.integer(input$DATE), input[-4], FUN = function(z) cumsum(c(TRUE, diff(z) > 1)))
input
#    FID    PID SETTING       DATE grp
# 1    1 100001      ST 2021-01-01   1
# 2    1 100001      ST 2021-01-02   1
# 7    2 200001      AB 2021-01-02   1
# 3    1 100001      ST 2021-01-03   1
# 8    2 200001      AB 2021-01-03   1
# 4    1 100002      AB 2021-01-04   1
# 9    2 200001      AB 2021-01-04   1
# 10   2 200002      TK 2021-01-05   1
# 5    1 100001      ST 2021-01-11   1
# 6    1 100001      ST 2021-01-12   1

out <- aggregate(DATE ~ FID + PID + SETTING + grp, data = input,
                 FUN = function(z) setNames(range(z), c("START","END")))
out <- do.call(data.frame, out)
out[,5:6] <- lapply(out[,5:6], as.Date, origin = "1970-01-01")
out
#   FID    PID SETTING grp DATE.START   DATE.END
# 1   1 100002      AB   1 2021-01-04 2021-01-04
# 2   2 200001      AB   1 2021-01-02 2021-01-04
# 3   1 100001      ST   1 2021-01-01 2021-01-03
# 4   2 200002      TK   1 2021-01-05 2021-01-05
# 5   1 100001      ST   2 2021-01-11 2021-01-12

Walk-through:

the ease of cumsum and diff is accomplished assuming that the dates are always ordered; it is not important (here) that the other grouping variables may be misordered;
ave(..) assigns groups of non-consecutive (diff over 1) dates, which we use in the next step;
aggregate calculates the range within each group, using your three variables plus our new grp grouping variable; each z in the anonymous function is a contiguous vector of dates, so range gives us that start/end dates;
unfortunately, aggregate is assigning a matrix as the fifth column instead of two separate columns, so do.call(data.frame, out) fixes that;
unfortunately, most base R aggregating functions tend to strip the Date (and POSIXt) class from the vectors, so we need to use as.Date to heal that.

dplyr

library(dplyr)
input %>%
  arrange(DATE) %>%
  group_by(FID, PID, SETTING) %>%
  mutate(grp = cumsum(c(TRUE, diff(DATE) > 1))) %>%
  group_by(FID, PID, SETTING, grp) %>%
  summarize(START = min(DATE), END = max(DATE)) %>%
  ungroup()
# # A tibble: 5 x 6
#     FID    PID SETTING   grp START      END       
#   <int>  <int> <chr>   <int> <date>     <date>    
# 1     1 100001 " ST"       1 2021-01-01 2021-01-03
# 2     1 100001 " ST"       2 2021-01-11 2021-01-12
# 3     1 100002 " AB"       1 2021-01-04 2021-01-04
# 4     2 200001 " AB"       1 2021-01-02 2021-01-04
# 5     2 200002 " TK"       1 2021-01-05 2021-01-05

data.table

library(data.table)
inputDT <- as.data.table(input)
setorder(inputDT, DATE)
inputDT[, grp := cumsum(c(TRUE, diff(DATE) > 1)), by = .(FID, PID, SETTING)
  ][, .(START = min(DATE), END = max(DATE)), by = .(FID, PID, SETTING, grp)
  ][]
#      FID    PID SETTING   grp      START        END
#    <int>  <int>  <char> <int>     <Date>     <Date>
# 1:     1 100001      ST     1 2021-01-01 2021-01-03
# 2:     2 200001      AB     1 2021-01-02 2021-01-04
# 3:     1 100002      AB     1 2021-01-04 2021-01-04
# 4:     2 200002      TK     1 2021-01-05 2021-01-05
# 5:     1 100001      ST     2 2021-01-11 2021-01-12

Grouping consecutive dates in an array together in PHP

This piece of code groups consecutive dates together and understands daylight saving.

Array of numbers

$dates = array
(
strtotime('2012-10-01'),

strtotime('2012-10-03'),
strtotime('2012-10-04'),
strtotime('2012-10-05'),
strtotime('2012-10-06'),
strtotime('2012-10-07'),

strtotime('2012-10-10'),
strtotime('2012-10-11'),
strtotime('2012-10-12'),
strtotime('2012-10-13'),
strtotime('2012-10-14'),
strtotime('2012-10-15'),
strtotime('2012-10-16'),
strtotime('2012-10-17'),
strtotime('2012-10-18'),
strtotime('2012-10-19'),
strtotime('2012-10-20'),

strtotime('2012-10-23'),
strtotime('2012-10-24'),
strtotime('2012-10-25'),
strtotime('2012-10-26'),
strtotime('2012-10-29'),
strtotime('2012-10-30'),
strtotime('2012-10-31'),
strtotime('2012-11-01'),
strtotime('2012-11-02'),

strtotime('2012-11-04')
);

Code:

$conseq = array(); 
$ii = 0;
$max = count($dates);

for($i = 0; $i < count($dates); $i++) {
    $conseq[$ii][] = date('Y-m-d',$dates[$i]);

    if($i + 1 < $max) {
        $dif = $dates[$i + 1] - $dates[$i];
        if($dif >= 90000) {
            $ii++;
        }   
    }
}

Outputs:

array
  0 => 
    array
      0 => string '2012-10-01' (length=10)
  1 => 
    array
      0 => string '2012-10-03' (length=10)
      1 => string '2012-10-04' (length=10)
      2 => string '2012-10-05' (length=10)
      3 => string '2012-10-06' (length=10)
      4 => string '2012-10-07' (length=10)
  2 => 
    array
      0 => string '2012-10-10' (length=10)
      1 => string '2012-10-11' (length=10)
      2 => string '2012-10-12' (length=10)
      3 => string '2012-10-13' (length=10)
      4 => string '2012-10-14' (length=10)
      5 => string '2012-10-15' (length=10)
      6 => string '2012-10-16' (length=10)
      7 => string '2012-10-17' (length=10)
      8 => string '2012-10-18' (length=10)
      9 => string '2012-10-19' (length=10)
      10 => string '2012-10-20' (length=10)
  3 => 
    array
      0 => string '2012-10-23' (length=10)
      1 => string '2012-10-24' (length=10)
      2 => string '2012-10-25' (length=10)
      3 => string '2012-10-26' (length=10)
  4 => 
    array
      0 => string '2012-10-29' (length=10)
      1 => string '2012-10-30' (length=10)
      2 => string '2012-10-31' (length=10)
      3 => string '2012-11-01' (length=10)
      4 => string '2012-11-02' (length=10)
  5 => 
    array
      0 => string '2012-11-04' (length=10)

Check if dates are continuous with PHP

Using the DateTime::diff function, you can get the difference in days. Then loop over the dates, and if the difference isn't one day, it's not continuous.

$dates = "2017-01-08, 2017-01-09, 2017-01-10";
var_dump(check_continuous_dates($dates)); // true

$dates = "2017-01-08, 2017-01-12, 2017-01-13";
var_dump(check_continuous_dates($dates)); // false

function check_continuous_dates($date) {
    $date = explode(", ", $date); // Convert it to an array we can loop
    $previous = new DateTime($date[0]); // Set the "previous" value
    unset($date[0]); // Unset the value we just set to $previous, so we don't loop it twice

    foreach ($date as $v) { // Loop the array
        $current = new DateTime($v);
        $diff = $current->diff($previous);

        // If the difference is exactly 1 day, it's continuous 
        if ($diff->days == 1) {
            $previous =  new DateTime($v);
        } else {
            return false;
        }
    }
    return true;
}

Demo

References

http://php.net/manual/en/datetime.diff.php

How to detect if dates are consecutive in Python?

Rather than rolling your own consecutive function you can simply convert date objects to integers using the .toordinal() method of datetime objects. The difference between the maximum and minimum value of the set of ordinal dates is one more than the length of the set:

from datetime import datetime

date_strs = ['07-06-2010', '06-24-2010', '07-05-2010', '06-25-2010']
# date_strs = ['02-29-2012', '02-28-2012', '03-01-2012']
# date_strs = ['01-01-2000']
dates = [datetime.strptime(d, "%m-%d-%Y") for d in date_strs]

date_ints = set([d.toordinal() for d in dates])

if len(date_ints) == 1:
    print "unique"
elif max(date_ints) - min(date_ints) == len(date_ints) - 1:
    print "consecutive"
else:
    print "not consecutive"

R: Converting consecutive dates from a single column into a 2-column range

Here I am checking whether the difference with the previous date (diff(Date)) is not 1. If so, start a new group (taking the cumsum of this indicator means g will increase by 1 whenever it's TRUE).

library(data.table)
setDT(df)

df[, Date := as.Date(Date, format = '%m/%d/%Y')]

df[, .(start = min(Date), end = max(Date)),
   by = .(Person_ID, Department, g = cumsum(c(0, diff(Date)) != 1))]

#    Person_ID Department g      start        end
# 1:    351581         GH 1 2019-12-01 2019-12-03
# 2:    351581         FR 2 2019-12-02 2019-12-02
# 3:    598168         GH 3 2019-12-16 2019-12-16
# 4:    351581         JE 4 2019-12-08 2019-12-09
# 5:    615418         AB 5 2019-12-20 2019-12-20
# 6:    615418         AB 6 2019-12-22 2019-12-22

If your data is not already ordered by date within (Person_ID, Department) groups, you can add order(Date) to the i part of df[i, j, k] i.e. change the code above to

df[order(Date), .(start = min(Date), end = max(Date)),
   by = .(Person_ID, Department, g = cumsum(c(0, diff(Date)) != 1))]

Note that for this updated example, this is not the same as grouping by Person_ID and Department

df[, .(start = min(Date), end = max(Date)),
   by = .(Person_ID, Department)]

#    Person_ID Department      start        end
# 1:    351581         GH 2019-12-01 2019-12-03
# 2:    351581         FR 2019-12-02 2019-12-02
# 3:    598168         GH 2019-12-16 2019-12-16
# 4:    351581         JE 2019-12-08 2019-12-09
# 5:    615418         AB 2019-12-20 2019-12-22

Data used:

df <- fread('
   Person_ID   Department   Date     
   351581      GH           12/1/2019
   351581      GH           12/2/2019
   351581      GH           12/3/2019
   351581      FR           12/2/2019
   598168      GH           12/16/2019
   351581      JE           12/8/2019
   351581      JE           12/9/2019
   615418      AB           12/20/2019
  615418      AB           12/22/2019
')

Find group of consecutive dates in Pandas DataFrame

It seems like you need two boolean masks: one to determine the breaks between groups, and one to determine which dates are in a group in the first place.

There's also one tricky part that can be fleshed out by example. Notice that df below contains an added row that doesn't have any consecutive dates before or after it.

>>> df
  DateAnalyzed       Val
1   2018-03-18  0.470253
2   2018-03-19  0.470253
3   2018-03-20  0.470253
4   2017-01-20  0.485949  # < watch out for this
5   2018-09-25  0.467729
6   2018-09-26  0.467729
7   2018-09-27  0.467729

>>> df.dtypes
DateAnalyzed    datetime64[ns]
Val                    float64
dtype: object

The answer below assumes that you want to ignore 2017-01-20 completely, without processing it. (See end of answer for a solution if you do want to process this date.)

First:

>>> dt = df['DateAnalyzed']
>>> day = pd.Timedelta('1d')
>>> in_block = ((dt - dt.shift(-1)).abs() == day) | (dt.diff() == day)
>>> in_block
1     True
2     True
3     True
4    False
5     True
6     True
7     True
Name: DateAnalyzed, dtype: bool

Now, in_block will tell you which dates are in a "consecutive" block, but it won't tell you to which groups each date belongs.

The next step is to derive the groupings themselves:

>>> filt = df.loc[in_block]
>>> breaks = filt['DateAnalyzed'].diff() != day
>>> groups = breaks.cumsum()
>>> groups
1    1
2    1
3    1
5    2
6    2
7    2
Name: DateAnalyzed, dtype: int64

Then you can call df.groupby(groups) with your operation of choice.

>>> for _, frame in filt.groupby(groups):
...     print(frame, end='\n\n')
... 
  DateAnalyzed       Val
1   2018-03-18  0.470253
2   2018-03-19  0.470253
3   2018-03-20  0.470253

  DateAnalyzed       Val
5   2018-09-25  0.467729
6   2018-09-26  0.467729
7   2018-09-27  0.467729

To incorporate this back into df, assign to it and the isolated dates will be NaN:

>>> df['groups'] = groups
>>> df
  DateAnalyzed       Val  groups
1   2018-03-18  0.470253     1.0
2   2018-03-19  0.470253     1.0
3   2018-03-20  0.470253     1.0
4   2017-01-20  0.485949     NaN
5   2018-09-25  0.467729     2.0
6   2018-09-26  0.467729     2.0
7   2018-09-27  0.467729     2.0

If you do want to include the "lone" date, things become a bit more straightforward:

dt = df['DateAnalyzed']
day = pd.Timedelta('1d')
breaks = dt.diff() != day
groups = breaks.cumsum()

Detect consecutive dates ranges using SQL

No joins or recursive CTEs needed. The standard gaps-and-island solution is to group by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group.

WITH t AS (
  SELECT InfoDate d,ROW_NUMBER() OVER(ORDER BY InfoDate) i
  FROM @d
  GROUP BY InfoDate
)
SELECT MIN(d),MAX(d)
FROM t
GROUP BY DATEDIFF(day,i,d)

Check for Consecutive Dates Within a Set and Return as Range