How can I do a contiguous group by in MySQL?
Use:
SELECT MIN(t.id) 'mi',
t.val,
COUNT(*)
FROM (SELECT x.id,
x.val,
CASE
WHEN xt.val IS NULL OR xt.val != x.val THEN
@rownum := @rownum+1
ELSE
@rownum
END AS grp
FROM TABLE x
JOIN (SELECT @rownum := 0) r
LEFT JOIN (SELECT t.id +1 'id',
t.val
FROM TABLE t) xt ON xt.id = x.id) t
GROUP BY t.val, t.grp
ORDER BY mi
The key here was to create an artificial value that would allow for grouping.
Previously, corrected Guffa's answer:
SELECT t.id, t.val
FROM TABLE t
LEFT JOIN TABLE t2 on t2.id + 1 = t.id
WHERE t2.val IS NULL
OR t.val <> t2.val
Contiguous Group By
One way is this:
with the_table(id, name , code , time) as(
select 1, 'A',111 , 1 union all
select 2, 'A',111 , 2 union all
select 3, 'A',888 , 3 union all
select 4, 'A',888 , 4 union all
select 5, 'A',888 , 5 union all
select 6, 'A',888 , 6 union all
select 7, 'A',888 , 7 union all
select 8, 'A',111 , 8 union all
select 9, 'A',111 , 9 union all
select 10, 'A',111 , 10
)
select name, code, min(time) ||','|| max(time) from (
select name, code, time, id,
row_number() over(order by id) -
row_number() over(partition by name , code order by id) as grp
from the_table
) t
group by name, code, grp
order by min(id)
(I forgot and just can't find/remember the name of technique, which creates groups grp
)
MySQL Group By Consecutive Rows
Here's another version that works with MySQL Variables and doesn't require 3 level nesting deep. The first one pre-sorts the records in order by postID and Date and assigns them a sequential number per group whenever any time a value changes in one of the Post ID, Type and/or action. From that, Its a simple group by... no comparing record version T to T2 to T3... what if you wanted 4 or 5 criteria... would you have to nest even more entries?, or just add 2 more @sql variables to the comparison test...
Your call on which is more efficient...
select
PreQuery.postID,
PreQuery.PostType,
PreQuery.Target,
PreQuery.Action,
PreQuery.Title,
min( PreQuery.Date ) as FirstActionDate,
max( PreQuery.Date ) as LastActionDate,
count(*) as ActionEntries,
group_concat( PreQuery.content ) as Content
from
( select
t.*,
@lastSeq := if( t.action = @lastAction
AND t.postID = @lastPostID
AND t.postType = @lastPostType, @lastSeq, @lastSeq +1 ) as ActionSeq,
@lastAction := t.action,
@lastPostID := t.postID,
@lastPostType := t.PostType
from
t,
( select @lastAction := ' ',
@lastPostID := 0,
@lastPostType := ' ',
@lastSeq := 0 ) sqlVars
order by
t.postid,
t.date ) PreQuery
group by
PreQuery.postID,
PreQuery.ActionSeq,
PreQuery.PostType,
PreQuery.Action
Here's my link to SQLFiddle sample
For the title, you might want to adjust the line...
group_concat( distinct PreQuery.Title ) as Titles,
At least this will give DISTINCT titles concatinated... much tougher to get let without nesting this entire query one more level by having the max query date and other elements to get the one title associated with that max date per all criteria.
Selecting contiguous block of records in mysql
There is a simple trick to collapse consecutive entries into a single group. If you group by (row_number - entry), the entries that are consecutive will end up in the same group. Here is an example demonstrating what I mean:
Query:
SELECT phonenum, @curRow := @curRow + 1 AS row_number, phonenum - @curRow
from phonenums p
join (SELECT @curRow := 0) r
Results:
| PHONENUM | ROW_NUMBER | PHONENUM - @CURROW |
-------------------------------------------------
| 27100070000 | 1 | 27100069999 |
| 27100070001 | 2 | 27100069999 |
| 27100070002 | 3 | 27100069999 |
| 27100070003 | 4 | 27100069999 |
| 27100070004 | 5 | 27100069999 |
| 27100070005 | 6 | 27100069999 |
| 27100070008 | 7 | 27100070001 |
| 27100070009 | 8 | 27100070001 |
| 27100070012 | 9 | 27100070003 |
| 27100070015 | 10 | 27100070005 |
| 27100070016 | 11 | 27100070005 |
| 27100070040 | 12 | 27100070028 |
Notice how the entries that are consecutive all have the same value for PHONENUM - @CURROW
. If we group on that column, and select the min & max of each group, you have the summary (with one exception: you could replace the END value with NULL
if START = END if that's a requirement):
Query:
select min(phonenum), max(phonenum) from
(
SELECT phonenum, @curRow := @curRow + 1 AS row_number
from phonenums p
join (SELECT @curRow := 0) r
) p
group by phonenum - row_number
Results:
| MIN(PHONENUM) | MAX(PHONENUM) |
---------------------------------
| 27100070000 | 27100070005 |
| 27100070008 | 27100070009 |
| 27100070012 | 27100070012 |
| 27100070015 | 27100070016 |
| 27100070040 | 27100070040 |
Demo: http://www.sqlfiddle.com/#!2/59b04/5
Group rows by contiguous date ranges for groups of values
You can identify the groups by using the difference of row_numbers()
. Consecutive values will have a constant.
select col1, col2, date1, min(date2), max(date2), rate
from (select t.*,
(row_number() over (partition by col1, col2, date1 order by date2) -
row_number() over (partition by col1, col2, date1, rate order by date2)
) as grp
from table t
) t
group by col1, col2, date1, rate, grp
find contiguous ranges with MySQL
LukStorms solution seems like the right kind of thing - it just seems a little more convoluted than is strictly necessary...
Here's a variation on the theme...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,customer_id int NOT NULL
,value int NOT NULL
);
INSERT INTO my_table VALUES
(1, 21, 1),
(2, 21, 2),
(3, 21, 3),
(4, 21, 20),
(5, 21, 21),
(6, 21, 22),
(7, 22, 5);
SELECT customer_id
, MIN(value) start
, MAX(value) end
FROM
( SELECT *
, CASE WHEN value = @prev+1 THEN @i:=@i ELSE @i:=@i+1 END grp
, @prev:=value prev
FROM my_table
, (SELECT @prev:= null,@i:=0) vars
ORDER
BY customer_id
, id
) x
GROUP
BY customer_id,grp;
+-------------+-------+------+
| customer_id | start | end |
+-------------+-------+------+
| 21 | 1 | 3 |
| 21 | 20 | 22 |
| 22 | 5 | 5 |
+-------------+-------+------+
How to group data within a range of contigious timestamps
This is a simplified gaps-and-island problem. Assuming that your RDBMS support window functions, you can approach this with a window sum. When the Start_Timestamp
of record is different than the End_Timestamp
of the previous record, a new group starts:
select
t.Row,
sum(case when Start_Timestamp = lag_End_Timestamp then 0 else 1 end)
over(order by End_Timestamp) series,
t.Start_Timestamp,
t.End_Timestamp,
t.Data_Item
from (
select
t.*,
lag(End_Timestamp) over (order by End_Timestamp) lag_End_Timestamp
from mytable t
) t
Demo on DB Fiddle:
Row | series | Start_Timestamp | End_Timestamp | Data_Item
--: | -----: | :------------------ | :------------------ | --------:
1 | 1 | 2019-08-12 22:07:53 | 2019-08-12 22:09:57 | 100
2 | 1 | 2019-08-12 22:09:57 | 2019-08-12 22:12:01 | 203
3 | 1 | 2019-08-12 22:12:01 | 2019-08-12 22:13:03 | 487
4 | 1 | 2019-08-12 22:13:03 | 2019-08-12 22:16:19 | 113
5 | 2 | 2019-08-12 22:24:34 | 2019-08-12 22:26:37 | 632
6 | 2 | 2019-08-12 22:26:37 | 2019-08-12 22:27:40 | 532
7 | 2 | 2019-08-12 22:27:40 | 2019-08-12 22:28:42 | 543
8 | 2 | 2019-08-12 22:28:42 | 2019-08-12 22:31:57 | 142
9 | 3 | 2019-08-13 19:56:06 | 2019-08-13 19:57:08 | 351
10 | 3 | 2019-08-13 19:57:08 | 2019-08-13 19:58:10 | 982
group' only contiguous data
One method uses lag()
to see if the name has changed. Then do a cumulative sum of that value:
select t.*,
sum(case when name = prev_name then 0 else 1 end) over (order by dt) as grp
from (select t.*,
lag(name) over (order by dt) as prev_name
from t1 t
) t;
For aggregating to get the minimum and maximum, I prefer the difference of row numbers approach:
select name, min(dt), max(dt)
from (select t.*,
(row_number() over (order by dt) -
row_number() over (partition by name order by dt)
) as grp
from t1 t
) t
group by grp, name;
To understand what is happening, it is best to run the subquery with the separate row numbers. In my experience, when you see the sequences and their differences, you'll have an "aha" moment.
Related Topics
How to Find Which Columns Don't Have Any Data (All Values Are Null)
Cast VS Ssis Data Flow Implicit Conversion Difference
Using Object_Id() Function with #Tables
How to Update with Inner Join in Oracle
Date Calculation with Parameter in Ssis Is Not Giving the Correct Result
Identifying Transitive Dependencies
Strip Non-Numeric Characters from a String
How to Check Column Structure in Ssis
How to Skip Comma from CSV Using Double Quotes
How to Do a Contiguous Group by in MySQL
Using Subquery in a Check Statement in Oracle
SQL Identity with Leading Padded Zeros
How to Write Blob from Oracle Column to the File System
Amazon Redshift - Lateral Column Alias Reference