Find min and max for subsets of consecutive rows - gaps and islands
You want to find consecutive groups of similar rows and then aggregation. I like the difference of row numbers approach:
select name, act, min(startt) as startt, max(endd) as endd
from (select i.*,
row_number() over (partition by name, act order by rn) as seqnum_na,
row_number() over (partition by name order by rn) as seqnum_n
from input i
) i
group by (seqnum_n - seqnum_na), name, act;
You can see how this works by looking at what the subquery does.
Finding the largest subsets of consecutive rows with a maximum gap size (gaps and islands)
Flag the start of group and aggregate groups
select min(t) start_t, max(t) end_t, count(*) cnt, sum(amt) total
from (
select t, amt, sum(flag) over(order by t) grp
from (
select t, amt, case when t - lag(t, 1, t-11) over(order by t) >= 10 then 1 end flag
from d
) t
) t
group by grp
Grouping subsets within ordered data in SQL
It is a gaps and islands problem, as was commented earlier, so you can check the linked articles to deep dive into the problem. But you need to group your data conditionally after finding islands (where you need to collapse only type = 'C'
items.
Here's the code:
with s as (
select '1001' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-01 00:00:00' as start_ts, timestamp '2021-01-01 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '2' as step, 'C' as ex_type, timestamp '2021-01-04 03:00:00' as start_ts, timestamp '2021-01-04 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '3' as step, 'C' as ex_type, timestamp '2021-01-03 07:00:00' as start_ts, timestamp '2021-01-03 08:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '4' as step, 'C' as ex_type, timestamp '2021-01-05 10:00:00' as start_ts, timestamp '2021-01-05 15:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '5' as step, 'Z' as ex_type, timestamp '2021-01-06 00:00:00' as start_ts, timestamp '2021-01-06 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '6' as step, 'Z' as ex_type, timestamp '2021-01-06 16:00:00' as start_ts, timestamp '2021-01-06 20:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '7' as step, 'C' as ex_type, timestamp '2021-01-07 08:00:00' as start_ts, timestamp '2021-01-07 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '8' as step, 'C' as ex_type, timestamp '2021-01-07 10:00:00' as start_ts, timestamp '2021-01-07 12:00:00' as end_ts from dummy union all
select '1002' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-04 08:00:00' as start_ts, timestamp '2021-01-04 16:00:00' as end_ts from dummy
)
, num as (
select
s.*
/*Find consecutive rows on ex_type field*/
, row_number() over(partition by order_nbr order by start_ts asc) as r1
, row_number() over(partition by order_nbr, ex_type order by start_ts asc) as r2
from s
)
select
order_nbr
, ex_type
, min(start_ts) as start_ts
, max(end_ts) as end_ts
, string_agg(step, ',' order by start_ts asc) as steps
from num
group by
order_nbr
, ex_type
, case
/*For C use group number, for others - use original row number not to collapse them*/
when ex_type = 'C'
then r1 - r2
else r1
end
order by
order_nbr
, start_ts asc
And here's db<>fiddle on PostgreSQL as HANA-syntactically identical platform for involved functions.
Gaps and islands on 2 columns - if column A consecutive and column B identical
There is not much to change in your query. You basically need to select name
and number
in the subquery and sort in the same order. Then you can group by name, number - rn
in the outer query.
SELECT
min(number) first_number,
max(number) last_number,
count(*) AS no_records,
name
FROM (
SELECT c.*, @rn := @rn + 1 rn
from (
SELECT name, number
FROM `table`
WHERE cc = 1
ORDER BY name, number
LIMIT 99999999999999999
) AS c
CROSS JOIN (SELECT @rn := 0) r
) c
GROUP BY name, number - rn
ORDER BY first_number ASC, name ASC;
Result:
first_number last_number no_records name
1 2 2 Apple
3 3 1 Bean
10 12 3 Hello
14 14 1 Deer
14 14 1 Door
15 15 1 Hello
17 17 1 Hello
db<>fiddle
I usually advocate against the use of session variables in this way. The reason is that such solutions depend on internal implementation, and can be broken by version updates or settings changes. For example: Once MariaDB decided to ignore the ORDER BY clause in subqueries without LIMIT. This is why I included a huge LIMIT.
I also replaced number
with first_number
in the outer ORDER BY clause to avoid problems with ONLY_FULL_GROUP_BY mode.
A more stable way to generate row numbers is to use an AOTO_INCREMENT column in a temporary table:
drop temporary table if exists tmp_tbl;
create temporary table tmp_tbl (
rn int unsigned auto_increment primary key,
name varchar(64) not null,
number int not null
);
insert into tmp_tbl (name, number)
select name, number
from `table`
order by name, number;
The final SELECT query is identical with the outer query above:
SELECT
min(number) first_number,
max(number) last_number,
count(*) AS no_records,
name
FROM tmp_tbl
GROUP BY name, number - rn
ORDER BY first_number ASC, name ASC;
db<>fiddle
In a more recent version (starting from MariaDB 10.2) you can use ROW_NUMBER()
window function instead:
SELECT
min(number) first_number,
max(number) last_number,
count(*) AS no_records,
name
FROM (
SELECT
name,
number,
row_number() OVER (ORDER BY name, number) as rn
FROM `table`
WHERE cc = 1
) c
GROUP BY name, number - rn
ORDER BY first_number ASC, name ASC;
db<>fiddle
Related Topics
Insert Deleted Values into a Table Before Delete with a Delete Trigger
Query JSONb Column Containing Array of JSON Objects
SQL Server Performance for Alter Table Alter Column Change Data Type
Why Does Nvl Always Evaluate 2Nd Parameter
The Difference Between Comma and Join in SQL
SQL Server: How to Imitate Oracle Keep Dense_Rank Query
Linq Orderby. Does It Always Return the Same Ordered List
Stop Access from Using Wrong Identity When Appending to Linked Table on SQL Server
How to Pivot on an Xml Column's Attributes in T-Sql
SQL Access How to Return Between Dates
SQL Convert 'Ddmmyy' to Datetime
Strip Non-Numeric Characters from a String
SQL Server Foreign Key to Multiple Tables
Passing SQL Stored Procedure Entirety of Where Clause
Converting Delimited String to Multiple Values in MySQL