Find Min and Max for Subsets of Consecutive Rows - Gaps and Islands

Find min and max for subsets of consecutive rows - gaps and islands

You want to find consecutive groups of similar rows and then aggregation. I like the difference of row numbers approach:

select name, act, min(startt) as startt, max(endd) as endd
from (select i.*,
             row_number() over (partition by name, act order by rn) as seqnum_na,
             row_number() over (partition by name order by rn) as seqnum_n
      from input i
     ) i
group by (seqnum_n - seqnum_na), name, act;

You can see how this works by looking at what the subquery does.

Finding the largest subsets of consecutive rows with a maximum gap size (gaps and islands)

Flag the start of group and aggregate groups

select min(t) start_t, max(t) end_t, count(*) cnt, sum(amt) total
from (
  select t, amt, sum(flag) over(order by t) grp
     from (
     select t, amt, case when t - lag(t, 1, t-11) over(order by t) >= 10 then 1 end flag
     from d
  ) t
) t
group by grp

Grouping subsets within ordered data in SQL

It is a gaps and islands problem, as was commented earlier, so you can check the linked articles to deep dive into the problem. But you need to group your data conditionally after finding islands (where you need to collapse only type = 'C' items.

Here's the code:

with s as (
  select '1001' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-01 00:00:00' as start_ts, timestamp '2021-01-01 09:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '2' as step, 'C' as ex_type, timestamp '2021-01-04 03:00:00' as start_ts, timestamp '2021-01-04 06:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '3' as step, 'C' as ex_type, timestamp '2021-01-03 07:00:00' as start_ts, timestamp '2021-01-03 08:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '4' as step, 'C' as ex_type, timestamp '2021-01-05 10:00:00' as start_ts, timestamp '2021-01-05 15:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '5' as step, 'Z' as ex_type, timestamp '2021-01-06 00:00:00' as start_ts, timestamp '2021-01-06 06:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '6' as step, 'Z' as ex_type, timestamp '2021-01-06 16:00:00' as start_ts, timestamp '2021-01-06 20:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '7' as step, 'C' as ex_type, timestamp '2021-01-07 08:00:00' as start_ts, timestamp '2021-01-07 09:00:00' as end_ts from dummy union all
  select '1001' as order_nbr, '8' as step, 'C' as ex_type, timestamp '2021-01-07 10:00:00' as start_ts, timestamp '2021-01-07 12:00:00' as end_ts from dummy union all
  select '1002' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-04 08:00:00' as start_ts, timestamp '2021-01-04 16:00:00' as end_ts from dummy
)
, num as (
  select
    s.*
    /*Find consecutive rows on ex_type field*/
    , row_number() over(partition by order_nbr order by start_ts asc) as r1
    , row_number() over(partition by order_nbr, ex_type order by start_ts asc) as r2
  from s
)
select
  order_nbr
  , ex_type
  , min(start_ts) as start_ts
  , max(end_ts) as end_ts
  , string_agg(step, ',' order by start_ts asc) as steps
from num
group by
  order_nbr
  , ex_type
  , case
      /*For C use group number, for others - use original row number not to collapse them*/
      when ex_type = 'C'
      then r1 - r2
      else r1
  end
order by 
  order_nbr
  , start_ts asc

And here's db<>fiddle on PostgreSQL as HANA-syntactically identical platform for involved functions.

Gaps and islands on 2 columns - if column A consecutive and column B identical

There is not much to change in your query. You basically need to select name and number in the subquery and sort in the same order. Then you can group by name, number - rn in the outer query.

SELECT
    min(number) first_number,
    max(number) last_number,
    count(*) AS no_records,
    name
FROM (
    SELECT c.*, @rn := @rn + 1 rn
    from (
        SELECT name, number
        FROM `table`
        WHERE cc = 1
        ORDER BY name, number
        LIMIT 99999999999999999
    ) AS c
    CROSS JOIN (SELECT @rn := 0) r
) c
GROUP BY name, number - rn 
ORDER BY first_number ASC, name ASC;

Result:

first_number  last_number  no_records  name
           1            2           2  Apple
           3            3           1  Bean
          10           12           3  Hello
          14           14           1  Deer
          14           14           1  Door
          15           15           1  Hello
          17           17           1  Hello

db<>fiddle

I usually advocate against the use of session variables in this way. The reason is that such solutions depend on internal implementation, and can be broken by version updates or settings changes. For example: Once MariaDB decided to ignore the ORDER BY clause in subqueries without LIMIT. This is why I included a huge LIMIT.

I also replaced number with first_number in the outer ORDER BY clause to avoid problems with ONLY_FULL_GROUP_BY mode.

A more stable way to generate row numbers is to use an AOTO_INCREMENT column in a temporary table:

drop temporary table if exists tmp_tbl;

create temporary table tmp_tbl (
  rn int unsigned auto_increment primary key,
  name varchar(64) not null,
  number int not null
);

insert into tmp_tbl (name, number)
  select name, number
  from `table`
  order by name, number;

The final SELECT query is identical with the outer query above:

SELECT
    min(number) first_number,
    max(number) last_number,
    count(*) AS no_records,
    name
FROM tmp_tbl
GROUP BY name, number - rn 
ORDER BY first_number ASC, name ASC;

db<>fiddle

In a more recent version (starting from MariaDB 10.2) you can use ROW_NUMBER() window function instead:

SELECT
    min(number) first_number,
    max(number) last_number,
    count(*) AS no_records,
    name
FROM (
    SELECT
        name,
        number,
        row_number() OVER (ORDER BY name, number) as rn
    FROM `table`
    WHERE cc = 1
) c
GROUP BY name, number - rn 
ORDER BY first_number ASC, name ASC;

db<>fiddle

Find Min and Max for Subsets of Consecutive Rows - Gaps and Islands