Sql: Finding Longest Date Gap

SQL: finding longest date gap

Database-agnostic, something of a variant of richardtallent's, but without the restrictions. (I'm using SQL Server 2008 here, but it shouldn't matter.)

Starting with this setup:

create table test(id int, userid int, time datetime)
insert into test values (1, 1, '2009-03-11 08:00')
insert into test values (2, 1, '2009-03-11 18:00')
insert into test values (3, 1, '2009-03-13 19:00')
insert into test values (4, 1, '2009-03-14 18:00')

Running this query:

select 
starttime.id as gapid, starttime.time as starttime, endtime.time as endtime,
/* Replace next line with your DB's way of calculating the gap */
DATEDIFF(second, starttime.time, endtime.time) as gap
from
test as starttime
inner join test as endtime on
(starttime.userid = endtime.userid)
and (starttime.time < endtime.time)
left join test as intermediatetime on
(starttime.userid = intermediatetime.userid)
and (starttime.time < intermediatetime.time)
and (intermediatetime.time < endtime.time)
where
(intermediatetime.id is null)

Gives the following:

gapid  starttime                endtime                  gap
1 2009-03-11 08:00:00.000 2009-03-11 18:00:00.000 36000
2 2009-03-11 18:00:00.000 2009-03-13 19:00:00.000 176400
3 2009-03-13 19:00:00.000 2009-03-14 18:00:00.000 82800

You can then just ORDER BY the gap expression descending, and pick the top result.

Some explanation:

  • Like richardtallent's answer, you join the table onto itself to find a 'later' record – this basically pairs all records with ANY of their later records, here pairing {1+2, 1+3, 1+4, 2+3, 2+4, 3+4}.
  • Then there's another self-join, this time a left join, to find rows in between the two previously selected so {1+2+null, 1+3+2, 1+4+2, 1+4+3, 2+3+null, 2+4+3, 3+4+null}.
  • The WHERE clause, though, filters these out (keeps only the rows with no intermediate row), hence keeping only {1+2+null, 2+3+null, 3+4+null}. Taa-daa!

If you could, potentially, have the same time in there twice (a 'gap' of 0) then you'll need a way to break ties, as Dems points out. If you can use ID as a tie-breaker, then change e.g.

and (starttime.time < intermediatetime.time) 

to

and ((starttime.time < intermediatetime.time) 
or ((starttime.time = intermediatetime.time) and (starttime.id < intermediatetime.id)))

assuming that 'id' is a valid way to break ties.

In fact, if you know that ID will be monotonically increasing (I know you said 'not sequential,' but it's not clear if this means that they don't increase with each row, or just that the IDs of the two relevant entries may not be sequential because e.g. another user has entries in between), you can use ID instead of time in all the comparisons to make this even simpler.

How do I find the longest time period without gaps in SQL?

I'll assume you have access to CTEs and ROW_NUMBER().

First, you need an ordered list of the dates. That is, not two columns, but one. Then you can compare one date to the immediate next date quite simply.

As you have the data in two columns, creating this one ordered list will be relatively expensive. I hope for your sake that you don't have a huge volume of data.

WITH
all_dates
AS
(
SELECT EnrolmentDate AS event_date FROM yourTable GROUP BY EnrolmentDate
UNION
SELECT CompletionDate AS event_date FROM yourTable GROUP BY CompletionDate
)
,
sequenced_dates
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY event_date) AS id,
event_date
FROM
all_dates
)
SELECT
MAX(DATEDIFF(DAY, first_event.event_date, second_event.event_date)) AS duration
FROM
sequenced_dates AS first_event
INNER JOIN
sequenced_dates AS second_event
ON first_event.id = second_event.id - 1

MySQL query to find the longest sequence of value based on date

You can assign a group by counting the number of 0 values cumulatively. Then just aggregate to see all the groups:

select min(timestamp), max(timestamp)
from (select t.*,
sum(value = 0) over (order by timestamp) as grp
from t
) t
where value = 1
group by grp;

To calculate the difference and take the longest period:

select min(timestamp), max(timestamp),
second_to_time(to_seconds(max(timestamp)) - to_seconds(min(timetamp)))
from (select t.*,
sum(value = 0) over (order by timestamp) as grp
from t
) t
where value = 1
group by grp
order by to_seconds(max(timestamp)) - to_seconds(min(timetamp)) desc
limit 1;

find longest duration between times

You can use the analytic functions LAG or LEAD to do this. By performing a DATEDIFF calculation on either the previous or next row (ordering by the time) we can determine the duration between each ordered pair. The only thing left to do is take the MAX.

with Data as (
select a.TheTime
, DateDiff(minute, Lag(a.TheTime, 1, a.TheTime) over(order by a.TheTime asc), a.TheTime) as Duration
from (values
(Convert(time(0), N'00:01:45'))
, (Convert(time(0), N'00:01:55'))
, (Convert(time(0), N'00:02:25'))
, (Convert(time(0), N'00:05:33'))
, (Convert(time(0), N'00:10:45'))
, (Convert(time(0), N'00:11:01'))
, (Convert(time(0), N'00:13:45'))
) as a (TheTime)
)
select Max(a.Duration) as MaxDuration
from Data as a;

How to calculate the longest and latest winning streak without the date gap

You can identify a streak by subtracting an incrementing number from the date. When this is constant, then you have a streak:

select min(date) as winning_streak_start, count(*) as num_days, sum(value) as total
from (select t.*, dateadd(day, -row_number() over (order by date), date) as grp
from t
where result = 'W'
) t
group by grp;

You can then use this to get the longest and the most recent:

with t as (
select min(date) as winning_streak_start, count(*) as streak, sum(value) as total
from (select t.*, dateadd(day, -row_number() over (order by date), date) as grp
from t
where result = 'W'
) t
group by grp
)
select t.*
from ((select top (1) streak, total, 'Longest' as RecordType
from t
order by streak desc
) union all
(select top (1) streak, total, 'Latest'
from t
order by winning_streak_start desc
)
) t;

SQL query to find gaps within a column of dates

This is a type of gaps-and-islands problem. In this case, subtracting a sequential number from each day is probably the simplest solution for identifying the "islands":

select user, status, count(*) as num_days, min(date), max(date)
from (select t.*,
row_number() over (partition by user, status order by date) as seqnum
from t
) t
group by user, status, date - seqnum * interval '1 day'

MySQL get longest gap without activity

Either of the following will work,
(I prefer the latter):

SELECT 
MAX(DATEDIFF(
(SELECT MIN(s2.dateCompletion)
FROM staff s2
WHERE s2.dateCompletion >= s.dateCompletion AND s2.id != s.id)
, dateCompletion))
from staff s;

In the above example, for each record, you find the next completed project, do a datediff, and then take the max.

In the example below, I use joins to do the same thing. If you're dataset is really big, you might be better off creating a temporary table and get rid of the derived table.

SELECT
MAX(DATEDIFF(s2.dateCompletion, s.dateCompletion))
FROM staff s
JOIN staff s2 ON s2.dateCompletion = (SELECT MIN(s3.dateCompletion)
FROM staff s3
WHERE s3.dateCompletion >= s.dateCompletion
AND s3.id != s.id)

Also, as you're measuring maximum period of inactivity, would you also like to include the date difference between the MAX(dateCompletion) and CURDATE(), then use the following:

SELECT
MAX(DATEDIFF(COALESCE(s2.dateCompletion, CURDATE()), s.dateCompletion))
FROM staff s
JOIN staff s2 ON s2.dateCompletion = (SELECT MIN(s3.dateCompletion)
FROM staff s3
WHERE s3.dateCompletion >= s.dateCompletion
AND s3.id != s.id)


Related Topics



Leave a reply



Submit