How to Do a Contiguous Group by in MySQL

How can I do a contiguous group by in MySQL?

Use:

   SELECT MIN(t.id) 'mi', 
          t.val, 
          COUNT(*)
     FROM (SELECT x.id, 
                 x.val, 
                 CASE 
                   WHEN xt.val IS NULL OR xt.val != x.val THEN 
                     @rownum := @rownum+1 
                   ELSE 
                     @rownum 
                 END AS grp
            FROM TABLE x
            JOIN (SELECT @rownum := 0) r
       LEFT JOIN (SELECT t.id +1 'id',
                         t.val
                    FROM TABLE t) xt ON xt.id = x.id) t
 GROUP BY t.val, t.grp
 ORDER BY mi

The key here was to create an artificial value that would allow for grouping.

Previously, corrected Guffa's answer:

   SELECT t.id, t.val
     FROM TABLE t
LEFT JOIN TABLE t2 on t2.id + 1 = t.id
    WHERE t2.val IS NULL 
       OR t.val <> t2.val

Contiguous Group By

One way is this:

with the_table(id, name ,   code  ,  time) as(
select 1, 'A',111   ,  1 union all
select 2, 'A',111   ,  2 union all
select 3, 'A',888  ,   3 union all
select 4, 'A',888  ,   4 union all
select 5, 'A',888 ,    5 union all
select 6, 'A',888   ,  6 union all
select 7, 'A',888  ,   7 union all
select 8, 'A',111  ,   8 union all
select 9, 'A',111   ,  9 union all
select 10, 'A',111  ,   10
)

select name, code, min(time) ||','|| max(time)  from (
    select name, code, time, id,
    row_number() over(order  by id)   -
    row_number() over(partition by name , code order  by id) as grp
    from the_table
) t
group by name, code, grp
order by min(id)

(I forgot and just can't find/remember the name of technique, which creates groups grp)

MySQL Group By Consecutive Rows

Here's another version that works with MySQL Variables and doesn't require 3 level nesting deep. The first one pre-sorts the records in order by postID and Date and assigns them a sequential number per group whenever any time a value changes in one of the Post ID, Type and/or action. From that, Its a simple group by... no comparing record version T to T2 to T3... what if you wanted 4 or 5 criteria... would you have to nest even more entries?, or just add 2 more @sql variables to the comparison test...

Your call on which is more efficient...

select
      PreQuery.postID,
      PreQuery.PostType,
      PreQuery.Target,
      PreQuery.Action,
      PreQuery.Title,
      min( PreQuery.Date ) as FirstActionDate,
      max( PreQuery.Date ) as LastActionDate,
      count(*) as ActionEntries,
      group_concat( PreQuery.content ) as Content
   from
      ( select
              t.*,
              @lastSeq := if( t.action = @lastAction
                          AND t.postID = @lastPostID
                          AND t.postType = @lastPostType, @lastSeq, @lastSeq +1 ) as ActionSeq,
              @lastAction := t.action,
              @lastPostID := t.postID,
              @lastPostType := t.PostType
           from
              t,
              ( select @lastAction := ' ',
                       @lastPostID := 0,
                       @lastPostType := ' ',
                       @lastSeq := 0 ) sqlVars
           order by
              t.postid,
              t.date ) PreQuery
   group by
      PreQuery.postID,
      PreQuery.ActionSeq,
      PreQuery.PostType,
      PreQuery.Action

Here's my link to SQLFiddle sample

For the title, you might want to adjust the line...

group_concat( distinct PreQuery.Title ) as Titles,

At least this will give DISTINCT titles concatinated... much tougher to get let without nesting this entire query one more level by having the max query date and other elements to get the one title associated with that max date per all criteria.

Selecting contiguous block of records in mysql

There is a simple trick to collapse consecutive entries into a single group. If you group by (row_number - entry), the entries that are consecutive will end up in the same group. Here is an example demonstrating what I mean:

Query:

SELECT phonenum, @curRow := @curRow + 1 AS row_number, phonenum - @curRow
from phonenums p
join (SELECT @curRow := 0) r

Results:

|    PHONENUM | ROW_NUMBER | PHONENUM - @CURROW |
-------------------------------------------------
| 27100070000 |          1 |        27100069999 |
| 27100070001 |          2 |        27100069999 |
| 27100070002 |          3 |        27100069999 |
| 27100070003 |          4 |        27100069999 |
| 27100070004 |          5 |        27100069999 |
| 27100070005 |          6 |        27100069999 |
| 27100070008 |          7 |        27100070001 |
| 27100070009 |          8 |        27100070001 |
| 27100070012 |          9 |        27100070003 |
| 27100070015 |         10 |        27100070005 |
| 27100070016 |         11 |        27100070005 |
| 27100070040 |         12 |        27100070028 |

Notice how the entries that are consecutive all have the same value for PHONENUM - @CURROW. If we group on that column, and select the min & max of each group, you have the summary (with one exception: you could replace the END value with NULL if START = END if that's a requirement):

Query:

select min(phonenum), max(phonenum) from
(
  SELECT phonenum, @curRow := @curRow + 1 AS row_number
  from phonenums p
  join (SELECT @curRow := 0) r
) p
group by phonenum - row_number

Results:

| MIN(PHONENUM) | MAX(PHONENUM) |
---------------------------------
|   27100070000 |   27100070005 |
|   27100070008 |   27100070009 |
|   27100070012 |   27100070012 |
|   27100070015 |   27100070016 |
|   27100070040 |   27100070040 |

Demo: http://www.sqlfiddle.com/#!2/59b04/5

Group rows by contiguous date ranges for groups of values

You can identify the groups by using the difference of row_numbers(). Consecutive values will have a constant.

select col1, col2, date1, min(date2), max(date2), rate
from (select t.*,
             (row_number() over (partition by col1, col2, date1 order by date2) -
              row_number() over (partition by col1, col2, date1, rate order by date2)
             ) as grp
      from table t
     ) t
group by col1, col2, date1, rate, grp

find contiguous ranges with MySQL

LukStorms solution seems like the right kind of thing - it just seems a little more convoluted than is strictly necessary...

Here's a variation on the theme...

    DROP TABLE IF EXISTS my_table;

    CREATE TABLE my_table
    (id SERIAL PRIMARY KEY
    ,customer_id int NOT NULL
    ,value int NOT NULL
    );

    INSERT INTO my_table VALUES
    (1,         21,    1),
    (2,         21,    2),
    (3,         21,    3),
    (4,         21,   20),
    (5,         21,   21),
    (6,         21,   22),
    (7,         22,    5);

    SELECT customer_id
         , MIN(value) start
         , MAX(value) end 
      FROM 
         ( SELECT *
                , CASE WHEN value = @prev+1 THEN @i:=@i ELSE @i:=@i+1 END grp
                , @prev:=value prev
             FROM my_table
                , (SELECT @prev:= null,@i:=0) vars 
            ORDER 
               BY customer_id
                , id
         ) x 
     GROUP 
        BY customer_id,grp;
    +-------------+-------+------+
    | customer_id | start | end  |
    +-------------+-------+------+
    |          21 |     1 |    3 |
    |          21 |    20 |   22 |
    |          22 |     5 |    5 |
    +-------------+-------+------+

How to group data within a range of contigious timestamps

This is a simplified gaps-and-island problem. Assuming that your RDBMS support window functions, you can approach this with a window sum. When the Start_Timestamp of record is different than the End_Timestamp of the previous record, a new group starts:

select
    t.Row,
    sum(case when Start_Timestamp = lag_End_Timestamp then 0 else 1 end) 
        over(order by End_Timestamp) series,
    t.Start_Timestamp,
    t.End_Timestamp,
    t.Data_Item
from (
    select
        t.*,
        lag(End_Timestamp) over (order by End_Timestamp) lag_End_Timestamp
    from mytable t
) t

Demo on DB Fiddle:


Row | series | Start_Timestamp     | End_Timestamp       | Data_Item
--: | -----: | :------------------ | :------------------ | --------:
  1 |      1 | 2019-08-12 22:07:53 | 2019-08-12 22:09:57 |       100
  2 |      1 | 2019-08-12 22:09:57 | 2019-08-12 22:12:01 |       203
  3 |      1 | 2019-08-12 22:12:01 | 2019-08-12 22:13:03 |       487
  4 |      1 | 2019-08-12 22:13:03 | 2019-08-12 22:16:19 |       113
  5 |      2 | 2019-08-12 22:24:34 | 2019-08-12 22:26:37 |       632
  6 |      2 | 2019-08-12 22:26:37 | 2019-08-12 22:27:40 |       532
  7 |      2 | 2019-08-12 22:27:40 | 2019-08-12 22:28:42 |       543
  8 |      2 | 2019-08-12 22:28:42 | 2019-08-12 22:31:57 |       142
  9 |      3 | 2019-08-13 19:56:06 | 2019-08-13 19:57:08 |       351
 10 |      3 | 2019-08-13 19:57:08 | 2019-08-13 19:58:10 |       982

group' only contiguous data

One method uses lag() to see if the name has changed. Then do a cumulative sum of that value:

select t.*,
       sum(case when name = prev_name then 0 else 1 end) over (order by dt) as grp
from (select t.*,
             lag(name) over (order by dt) as prev_name
      from t1 t
     ) t;

For aggregating to get the minimum and maximum, I prefer the difference of row numbers approach:

select name, min(dt), max(dt)
from (select t.*,
             (row_number() over (order by dt) - 
              row_number() over (partition by name order by dt)
             ) as grp 
      from t1 t
     ) t
 group by grp, name;

To understand what is happening, it is best to run the subquery with the separate row numbers. In my experience, when you see the sequences and their differences, you'll have an "aha" moment.