Merge Adjacent Rows in SQL

Merge adjacent rows in SQL?

This article provides quite a few possible solutions to your question

http://www.sqlmag.com/blog/puzzled-by-t-sql-blog-15/tsql/solutions-to-packing-date-and-time-intervals-puzzle-136851

This one seems like the most straight forward:

WITH StartTimes AS
(
SELECT DISTINCT username, starttime
FROM dbo.Sessions AS S1
WHERE NOT EXISTS
(SELECT * FROM dbo.Sessions AS S2
WHERE S2.username = S1.username
AND S2.starttime < S1.starttime
AND S2.endtime >= S1.starttime)
),
EndTimes AS
(
SELECT DISTINCT username, endtime
FROM dbo.Sessions AS S1
WHERE NOT EXISTS
(SELECT * FROM dbo.Sessions AS S2
WHERE S2.username = S1.username
AND S2.endtime > S1.endtime
AND S2.starttime <= S1.endtime)
)
SELECT username, starttime,
(SELECT MIN(endtime) FROM EndTimes AS E
WHERE E.username = S.username
AND endtime >= starttime) AS endtime
FROM StartTimes AS S;

Merge two consecutive rows into a column

Provided mytable contains only valid pairs of in / out events

select EmployeeCode, 
max(case EntryType when 'IN' then TIME end ) IN_TIME,
max(case EntryType when 'OUT' then TIME end ) OUT_TIME
from (
select EmployeeCode, EntryType, TIME,
row_number() over(partition by EmployeeCode, EntryType order by TIME) rn
from mytable
)t
group by EmployeeCode, rn
order by EmployeeCode, rn

Otherwise a kind of clean-up is required first.

Merge groups of consecutive rows in T-SQL and sum values from each group

This is a gaps-and-islands problem, as GMB points out. Because you want to keep the first row, I'm going to suggest a lag() approach instead of the difference of row numbers:

SELECT SessionId, TransactionType, DateTimeEnd,DateStart, sumRank
FROM (SELECT t.*,
SUM(Rank) OVER (PARTITION BY SessionId, TransactionType, grp) as sumRank
FROM (SELECT t.*,
SUM(CASE WHEN prev_st_id = prev_id THEN 0 ELSE 1 END) OVER (ORDER BY id) as grp
FROM (SELECT t.*,
LAG(id) OVER (PARTITION BY SessionId, TransactionType ORDER BY id) as prev_st_id,
LAG(id) OVER (PARTITION BY SessionId ORDER BY id) as prev_id
FROM Transactions t
) t
) t
) t
WHERE prev_st_id <> prev_id OR prev_st_id IS NULL;

What does this do?

  • Innermost subquery calculates the lag of the id both overall and by session/transaction type. This uses id because it seems more stable than date/times (there are duplicate date/time values in one of the columns).
  • When the ids are different, then a new island is identified. The cumulative sum identifies the groups.
  • This grp is then used to calculate values over the entire group, using window functions.
  • The outer query then just filters down to the first row in each group.

Here is a db<>fiddle.

Merge consecutive duplicate rows in SQL Server

You can use row_number()s & find the groups to do aggregation :

select item, price, min(validfrom) as validfrm, max(validto) as valdto
from (select t.*,
row_number() over(partition by item order by validfrom desc) as seq1,
row_number() over(partition by item, price order by validfrom desc) as seq2
from table t
) t
group by item, price, (seq1 - seq2)
order by item, validfrm desc;

Microsoft SQL Server 2012 merge data of exactly two adjacent rows into one row

For creating the table and inserting some values

create table mastertable
(
id int,
Name varchar(20),
ParentID int,
Value varchar(50)
)

insert into mastertable values(86, 'Description1',83,'021541-sds2-45515')
insert into mastertable values(87, 'value1',83, '$247.56')
insert into mastertable values(90, 'Description3',83, '021541-sds2-45567')
insert into mastertable values(91,'value3',83, '$66.72')
insert into mastertable values(94, 'Description5',83, '021541-sd72-47715')
insert into mastertable values(95,'value5',83, '$57.11')

Now, try this query:

with m1 as
(
SELECT *, RN = ROW_NUMBER() OVER (order by id) FROM mastertable
),
m2 as
(
SELECT *, RN = ROW_NUMBER() OVER (order by id) FROM mastertable
)

SELECT m1.id, m1.name, m1.parentid, m1.value as value1, m2.value as value2
FROM m1 left join m2
ON m1.RN = m2.RN-1
WHERE m1.RN%2 != 0

EDIT:
For m2, instead of SELECT *, you could also use:

m2 as
(
SELECT id,value, RN = ROW_NUMBER() OVER (order by id) FROM mastertable
)

Combining consecutive rows based on a type column

Gordon beat me to the answer. Yes, this does fit into an Islands and Gaps methodology. I think a LEAD() would suit this issue quite well. But I also tried a second query with a ROW_NUMBER(), which produced a slightly shorter execution plan. Not sure which would be more efficient at scale. That would require a bit more testing.

NOTE 1: I also added an assumed SessionID and UserID to my query. The additional columns may change your final results.

NOTE 2: SQL Fiddle reports that the ROW_NUMBER version runs much faster with fewer "same time" entries, but LEAD version is faster with many "same time" entries.

SQL Fiddle

MS SQL Server 2014 Schema Setup:

CREATE TABLE foo ( ID int IDENTITY, sessionID int, userid int, AuditType varchar(50), [DateTime] datetime ) ;
INSERT INTO foo ( sessionID, userID, AuditType, [DateTime] )
VALUES
(1,1,'Viewed Record','2017-01-03 11:16:33.000')
, (1,1,'Saved Form','2017-01-04 09:51:36.837')
, (2,2,'Viewed Record','2017-01-04 09:52:00.000')
, (1,1,'Saved Form','2017-01-04 09:52:40.837')
, (1,1,'Saved Form','2017-01-04 09:52:44.837')
, (2,2,'Saved Form','2017-01-04 09:52:45.000')
, (2,2,'Saved Form','2017-01-04 09:52:46.000')
, (2,2,'Saved Form','2017-01-04 09:52:47.000')
, (2,2,'Saved Form','2017-01-04 09:52:48.000')
, (1,1,'Saved Form','2017-01-04 09:52:49.837')
, (1,1,'Saved Form','2017-01-04 09:52:54.837')
, (2,2,'Exported Record','2017-01-04 09:53:00.000')
, (1,1,'Saved Form','2017-01-04 09:54:59.837')
, (1,1,'Exported Record','2017-01-04 09:55:59.837')
, (2,1,'Viewed Record','2017-01-04 10:00:00.000')
, (2,1,'Saved Form','2017-01-04 10:02:00.000')
, (2,1,'Saved Form','2017-01-04 10:04:00.000')
, (2,1,'Saved Form','2017-01-04 10:06:00.000')
, (2,1,'Exported Record','2017-01-04 10:10:00.000')
;

Query 1 (LEAD()):

SELECT s1.sessionID
, s1.userID
, s1.AuditType
, s1.[DateTime]
FROM (
SELECT foo.*
, LEAD(foo.AuditType) OVER ( ORDER BY foo.userID, foo.sessionID, foo.[DateTime] ) AS next_type
FROM foo
) s1
WHERE s1.next_type IS NULL OR s1.next_type <> s1.AuditType
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]

Results:

| sessionID | userID |       AuditType |                 DateTime |
|-----------|--------|-----------------|--------------------------|
| 1 | 1 | Viewed Record | 2017-01-03T11:16:33Z |
| 1 | 1 | Saved Form | 2017-01-04T09:54:59.837Z |
| 1 | 1 | Exported Record | 2017-01-04T09:55:59.837Z |
| 2 | 1 | Viewed Record | 2017-01-04T10:00:00Z |
| 2 | 1 | Saved Form | 2017-01-04T10:06:00Z |
| 2 | 1 | Exported Record | 2017-01-04T10:10:00Z |
| 2 | 2 | Viewed Record | 2017-01-04T09:52:00Z |
| 2 | 2 | Saved Form | 2017-01-04T09:52:48Z |
| 2 | 2 | Exported Record | 2017-01-04T09:53:00Z |

Query 2 (ROW_NUMBER()):

SELECT s1.*
FROM (
SELECT foo.*
, ROW_NUMBER() OVER ( PARTITION BY foo.userID, foo.sessionID, foo.AuditType ORDER BY foo.userID, foo.sessionID, foo.[DateTime] DESC ) AS rn
FROM foo
) s1
WHERE rn = 1
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]

Results:

| ID | sessionID | userid |       AuditType |                 DateTime | rn |
|----|-----------|--------|-----------------|--------------------------|----|
| 1 | 1 | 1 | Viewed Record | 2017-01-03T11:16:33Z | 1 |
| 13 | 1 | 1 | Saved Form | 2017-01-04T09:54:59.837Z | 1 |
| 14 | 1 | 1 | Exported Record | 2017-01-04T09:55:59.837Z | 1 |
| 15 | 2 | 1 | Viewed Record | 2017-01-04T10:00:00Z | 1 |
| 18 | 2 | 1 | Saved Form | 2017-01-04T10:06:00Z | 1 |
| 19 | 2 | 1 | Exported Record | 2017-01-04T10:10:00Z | 1 |
| 3 | 2 | 2 | Viewed Record | 2017-01-04T09:52:00Z | 1 |
| 9 | 2 | 2 | Saved Form | 2017-01-04T09:52:48Z | 1 |
| 12 | 2 | 2 | Exported Record | 2017-01-04T09:53:00Z | 1 |

They should both show:

  1,1,'Viewed Record','2017-01-03 11:16:33.000'
1,1,'Saved Form','2017-01-04 09:54:59.837'
1,1,'Exported Record','2017-01-04 09:55:59.837'

2,1,'Viewed Record','2017-01-04 10:00:00.000'
2,1,'Saved Form','2017-01-04 10:06:00.000'
2,1,'Exported Record','2017-01-04 10:10:00.000'

2,2,'Viewed Record','2017-01-04 09:52:00.000'
2,2,'Saved Form','2017-01-04 09:52:48.000'
2,2,'Exported Record','2017-01-04 09:53:00.000'

Need help with a SQL query that combines adjacent rows into a single row

I don't know whether Russ' answer helps you at all. Here is a link to an article that explains how to add row numbers to the results of a query. (Search for "row_number" to find the most likely example.)

Once you have a query numbering the rows properly, you should be able to throw that into a CTE, then select from it twice -- once for odd numbers, then again for even numbers. Have each result return the even numbered value for joining (odd - 1 = even). At that point, you can join the results of the queries and get two products on one row.

Combining adjacent rows based on comparision of values from different colums

This is a form of gaps-and-islands. You can solve it by finding where the islands start -- say by using lag() and then using a cumulative sum (to identify the groups) and aggregation:

select category, grp, min(date_start), max(date_end),
sum(count)
from (select t.*,
sum(case when prev_date_end >= date_start then 0 else 1 end) over (partition by category order by date_start) as grp
from (select t.*,
lag(date_end) over (partition by category order by date_start) as prev_date_end
from t
) t
) t
group by category, grp;

How to merge consecutive rows of date and time using MySQL

With LAG() window function:

select Data, Date, Time
from (
select *,
concat(Date, ' ', Time) - interval 30 minute <=
lag(concat(Date, ' ', Time)) over (partition by Data order by Date, Time) flag
from tablename
) t
where coalesce(flag, 0) = 0
order by Data, Date, Time

See the demo.

Results:

> Data  | Date       | Time    
> :---- | :--------- | :-------
> Data1 | 2020-08-19 | 13:00:00
> Data1 | 2020-08-21 | 07:00:00
> Data2 | 2020-08-20 | 08:00:00


Related Topics



Leave a reply



Submit