Group Consecutive Rows of Same Value Using Time Spans

Group consecutive rows of same value using time spans

The query determines each rows EndTime by using NOT EXISTS to make sure no other class or course of a different type is scheduled between a course range's StartTime and EndTime and then uses MIN and GROUP BY to find the StartTime.

The NOT EXISTS part ensures that there aren't "breaks" between the StartTime and EndTime ranges by searching for any rows that have an EndTime between StartTime and EndTime but belong to a different CourseName or CourseRoom.

SELECT    
t0.ClassRoom,
t0.CourseName,
MIN(t0.StartTime),
t0.EndTime
FROM (
SELECT
t1.ClassRoom,
t1.CourseName,
t1.StartTime,
(
SELECT MAX(t2.EndTime)
FROM tableA t2
WHERE t2.CourseName = t1.CourseName
AND t2.ClassRoom = t1.ClassRoom
AND NOT EXISTS (SELECT 1 FROM tableA t3
WHERE t3.EndTime < t2.EndTime
AND t3.EndTime > t1.EndTime
AND (t3.CourseName <> t2.CourseName
OR t3.ClassRoom <> t2.ClassRoom)
)
) EndTime
FROM tableA t1
) t0 GROUP BY t0.ClassRoom, t0.CourseName, t0.EndTime

http://www.sqlfiddle.com/#!6/39d4b/9

How to group consecutive rows with same values in a result table into groups with date_from and date_until

If you enumerate the rows for each combination of f1, f2, and f3, then subtract that number of days from datum, then the value will be constant on adjacent days where the three columns are the same.

The rest is just aggregation:

select f1, f2, f3, min(datum), max(datum)
from (select t1.*,
row_number() over (partition by f1, f2, f3 order by datum) as seqnum
from test1 t1
) t1
group by f1, f2, f3, datum - seqnum * interval '1 day'
order by min(datum);

Here is a db<>fiddle.

Condense multiple consecutive rows using first and last row

This is a gaps and islands problem, where you want to group together consecutive rows that have the same status and adjacent periods.

You can use window functions; the idea is to define groups with a window sum that increments whenever there is a status change or a periods break:

select min(status) as status, min(starttime) as starttime, max(endtime) as endtime
from (
select t.*,
sum(case when starttime = lag_endtime and status = lag_status then 0 else 1 end) over(order by starttime) as grp
from (
select t.*,
lag(endtime) over(order by starttime) lag_endtime,
lag(status) over(order by starttime) lag_status
from mytable t
) t
) t
group by grp

Demo on DB Fiddle:


status | starttime | endtime
:----- | :---------------------- | :----------------------
State1 | 2020-11-01 13:00:29.000 | 2020-11-01 13:05:29.000
State2 | 2020-11-01 13:05:29.000 | 2020-11-01 13:11:31.000
State2 | 2020-11-01 16:19:35.000 | 2020-11-01 16:19:55.000

how to group consecutive rows?

If I understand the posted problem correctly, then your CTE effectively determines time buckets (or intervals) for all of your alarms. Your final select clause joins the actual alarm information with your alarm intervals.
Part of your problem is your alarming system will continue to log “Alarms Scanned” entries if your alarm remains active for prolonged periods (I assume longer than your alarm scan cycle) which effectively causes active alarms to be split.
If you have SQL Server 2012 or higher, then it is relatively easy to determine if the alarm event got split. You simply need to check if the end time of an alarm is equal to the start time of the next alarm of the same alarm type. You can achieve this with the use of the LAG windowing function in 2012.

The next step is to generate an ID that you can group your alarm by so that you can combine your split events. This is achieved via the SUM OVER clause.
The following example shows how this can be achieved:

;WITH AlarmTimeBuckets
AS
(
SELECT EventStart.Ev_Comment AS StartDateTime
,MIN(COALESCE (EventEnd.Ev_Comment, EventStart.Ev_Comment)) AS EndDateTime
,EventStart.Ev_Message As Machine
FROM A EventStart
INNER JOIN A EventEnd ON EventStart.Ev_Comment < EventEnd.Ev_Comment AND EventStart.Ev_Custom1 = 'Alarms Scanned' AND EventEnd.Ev_Custom1 = 'Alarms Scanned' AND EventStart.Ev_Message = EventEnd.Ev_Message
GROUP BY EventStart.Ev_Message, EventStart.Ev_Comment
),
AlarmsByTimeBucket
AS
(
SELECT AlarmTimeBuckets.Machine
,AlarmTimeBuckets.StartDateTime
,AlarmTimeBuckets.EndDateTime
,Alarm.Ev_Custom1 AS Alarm
,(
CASE
WHEN LAG(AlarmTimeBuckets.EndDateTime, 1, NULL) OVER (PARTITION BY Alarm.Ev_Custom1,Alarm.Ev_Message ORDER BY AlarmTimeBuckets.StartDateTime) = AlarmTimeBuckets.StartDateTime THEN 0
ELSE 1
END
) AS IsNewEvent
FROM A Alarm
INNER JOIN AlarmTimeBuckets ON Alarm.Ev_Message = AlarmTimeBuckets.Machine AND Alarm.Ev_Comment = AlarmTimeBuckets.StartDateTime
WHERE (Alarm.Ev_Custom1 <> 'Alarms Scanned')
)
,
AlarmsByGroupingID
AS
(
SELECT Machine
,StartDateTime
,EndDateTime
,Alarm
,SUM(IsNewEvent) OVER (ORDER BY Machine, Alarm, StartDateTime) AS GroupingID
FROM AlarmsByTimeBucket
)
SELECT MAX(Machine) AS Machine
,MIN(StartDateTime) AS StartDateTime
,MAX(EndDateTime) AS EndDateTime
,MAX(Alarm) AS Alarm
FROM AlarmsByGroupingID
GROUP BY GroupingID
ORDER BY StartDateTime

Group rows in data frame based on time difference between consecutive rows

Here is another possibility which groups rows where the time difference between consecutive rows is less than 4 days.

# create date variable
df$date <- with(df, as.Date(paste(YEAR, MONTH, DAY, sep = "-")))

# calculate succesive differences between dates
# and identify gaps larger than 4
df$gap <- c(0, diff(df$date) > 4)

# cumulative sum of 'gap' variable
df$group <- cumsum(df$gap) + 1

df
# YEAR MONTH DAY HOUR LON LAT date gap group
# 1 1860 10 3 13 -19.5 3 1860-10-03 0 1
# 2 1860 10 3 17 -19.5 4 1860-10-03 0 1
# 3 1860 10 3 21 -19.5 5 1860-10-03 0 1
# 4 1860 10 5 5 -20.5 6 1860-10-05 0 1
# 5 1860 10 5 13 -21.5 7 1860-10-05 0 1
# 6 1860 10 5 17 -21.5 8 1860-10-05 0 1
# 7 1860 10 6 1 -22.5 9 1860-10-06 0 1
# 8 1860 10 6 5 -22.5 10 1860-10-06 0 1
# 9 1860 12 5 9 -22.5 -7 1860-12-05 1 2
# 10 1860 12 5 18 -23.5 -8 1860-12-05 0 2
# 11 1860 12 5 22 -23.5 -9 1860-12-05 0 2
# 12 1860 12 6 6 -24.5 -10 1860-12-06 0 2
# 13 1860 12 6 10 -24.5 -11 1860-12-06 0 2
# 14 1860 12 6 18 -24.5 -12 1860-12-06 0 2

Disclaimer: the diff & cumsum part is inspired by this Q&A: How to partition a vector into groups of regular, consecutive sequences?.

Grouping consecutive date periods with criteria

Solution: Use binary grouping for "consecutive" date criteria (in this case < 7 days) and then use that grouping in a sum over with rows unbounded preceding

with cte as (
select
*,
COALESCE(DATEDIFF(dd, LAG(EndDate, 1, NULL) OVER (PARTITION BY [Type] ORDER BY StartDate), StartDate),0) AS GapDays
from
#data
),
cte2 as (
select
*,
case when GapDays < 7 then 0 else 1 end as group1
from
cte
),
cte3 as (
select
*,
sum(group1) over (partition by [type] order by startDate, endDate rows unbounded preceding) as group2
from
cte2
)
select
[TYPE],
MIN(StartDate) AS StartDate,
MAX(EndDate) AS EndDate,
SUM(Dollars) AS Dollars,
SUM(CASE WHEN GapDays > 7 THEN 0 ELSE GapDays END) AS GapDays
from
cte3
group by
[Type], group2


Related Topics



Leave a reply



Submit