How to Collapse The Rows of a Select into a String

What is the best way to collapse the rows of a SELECT into a string?

You can concatenate using an embedded 'set' statement in a query:

declare @combined varchar(2000)
select @combined = isnull(@combined + ', ','') + isnull(value,'')
from simpleTable

print @combined

(Note that the first isnull() initialises the string, and the second isnull() is especially important if there's any chance of nulls in the 'value' column, because otherwise a single null could wipe out the whole concatenation)

(edited code and explanation after comments)

Edit (ten years later):

SQL Server 2017 introduced the STRING_AGG() function which provides an official way of concatenating strings from different rows. Like other aggregation functions such as COUNT(), it can be used with GROUP BY.

So for the example above you could do:

select string_agg(value, ', ')
from simpleTable

If you had some other column and you wanted to concatenate for values of that column, you would add a 'group by' clause, e.g:

select someCategory, string_agg(value, ', ') as concatValues
from simpleTable
group by someCategory

Note string_agg will only work with SQL 2017 and above.

Collapsing multiple rows containing substrings into a single row

demo:db<>fiddle

For more general cases (e.g. "the difference could be at the 10th letter" or "there is a row with just one char") you need to identify the right groups. So it would be necessary to check rows against the next one: "Is the current row a beginning of the next one?"

With some things like substring you can check a special length at the beginning ("Group all texts beginning with the same 3 letters" But what if you do not have 3 letters? Or the difference is somewhere later?)

That's why I calculated special groups with the help of the lag window function (https://www.postgresql.org/docs/current/static/tutorial-window.html):

SELECT 
max(id) as id, -- C
max(phrase) as phrase,
sum("count") as count
FROM (
SELECT
*,
SUM(is_diff) OVER (ORDER BY id, phrase) as ranked -- B
FROM (
SELECT
*,
-- A:
CASE WHEN phrase LIKE (lag(phrase) over (order by id, phrase)) || '%' THEN 0 ELSE 1 END as is_diff
FROM phrases
)s
) s
GROUP BY ranked
ORDER BY ranked

The main idea is discussed here.

A: The lag function allows to check the value of the next row. So if the phrase of the current row the beginning of the phrase of the next row then they are in the same group. (current_row LIKE (next_row || '%')). This works because the id groups are ordered by phrase texts (and their lengths).

If the rows are not compatible a helper variable is set to 1, otherwise to 0.

B: The helper variables can be added and the groups are generated. (For more details see the link provided above).

C: The rest is simple grouping by the new generated group values.

How to collapse rows into a comma-delimited list in an SQL Query in MySql

Is called group_concat

select group_concat(your_id) from your_table

How to concatenate text from multiple rows into a single text string in SQL Server

If you are on SQL Server 2017 or Azure, see Mathieu Renda answer.

I had a similar issue when I was trying to join two tables with one-to-many relationships. In SQL 2005 I found that XML PATH method can handle the concatenation of the rows very easily.

If there is a table called STUDENTS

SubjectID       StudentName
---------- -------------
1 Mary
1 John
1 Sam
2 Alaina
2 Edward

Result I expected was:

SubjectID       StudentName
---------- -------------
1 Mary, John, Sam
2 Alaina, Edward

I used the following T-SQL:

SELECT Main.SubjectID,
LEFT(Main.Students,Len(Main.Students)-1) As "Students"
FROM
(
SELECT DISTINCT ST2.SubjectID,
(
SELECT ST1.StudentName + ',' AS [text()]
FROM dbo.Students ST1
WHERE ST1.SubjectID = ST2.SubjectID
ORDER BY ST1.SubjectID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)') [Students]
FROM dbo.Students ST2
) [Main]

You can do the same thing in a more compact way if you can concat the commas at the beginning and use substring to skip the first one so you don't need to do a sub-query:

SELECT DISTINCT ST2.SubjectID, 
SUBSTRING(
(
SELECT ','+ST1.StudentName AS [text()]
FROM dbo.Students ST1
WHERE ST1.SubjectID = ST2.SubjectID
ORDER BY ST1.SubjectID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)'), 2, 1000) [Students]
FROM dbo.Students ST2

collapse rows using sql stuff function

There's no "COLLAPSE" in SQL, the language. What you posted shows grouping and aggregating. For strings, the only meaningful aggregations are MIN, MAX and string concatenation.

SQL Server 2017 provides string concatenation through the STRING_AGG function. Other database products use different names, like GROUP_CONCAT. In earlier SQL Server versions various techniques are used to do the same. What you posted is the XML technique.

The query doesn't have a GROUP BY clause though, which is why multiple rows are returned.

In SQL Server 2017, the query would look like this :

select T1.Id, 
T1.Name,
MIN(T2.Id) as T2_ID,
MIN(T2.T1_Id) as T1_ID,
MIN(T2.Name) as T2_Name,
STRING_AGG(T3.Name,', ') as Test5
from @Test1 T1
inner join @Test2 T2 on T1.Id=T2.T1_Id
inner join @Test3 T3 on T3.Id=T2.T3_Id
GROUP BY T1.ID,T1.Name

Doing the same in earlier versions is trickier. The FOR XML query can only correlate with columns that appear in the GROUP BY clause :

select T1.Id, 
T1.Name,
MIN(T2.Id),
MIN(T2.T1_Id),
MIN(T2.Name),
stuff(
(
SELECT ','+T3.Name
FROM @Test3 T3
inner join @Test2 TT2 on T3.Id=TT2.T3_Id
WHERE TT2.T1_ID=T1.Id
FOR XML PATH('')
),1,1,'') as Test5
from @Test1 T1
inner join @Test2 T2 on T1.Id=T2.T1_Id
GROUP BY T1.ID,T1.Name

Think of the entire STUFF(... FOR XML).. part as a single function that takes as argument one of the grouping columns in the WHERE clause, searches some tables and concatenates the string results.

This means that the correlation clause

WHERE TT2.T1_ID=T1.Id 

can only refer to the grouping columns in the outer query. We need a join with T2 in there because we can't get to T1.ID from T3 directly

If you don't care about the T2 columns, you can get rid of the JOIN in the outer query :

select T1.Id, 
T1.Name,
stuff(
(
select ','+T3.Name
from @Test3 T3
inner join @Test2 TT2 on T3.Id=TT2.T3_Id
where TT2.T1_ID=T1.Id for xml path('')
),1,1,'') as Test5
from @Test1 T1
GROUP BY T1.ID,T1.Name

How to collapse similar rows in SQL query

Use recursive Common Table Expressions (working SQLFiddle example)

;with RecursivePersonTask( Id, Department, Name, Tasks )
as
(
select
a.Id
, a.Department
, a.Name
, a.Task
from
dbo.Task a
left outer join dbo.Task b
on a.Department = b.Department
and a.Name = b.Name
and a.Id = b.Id + 1
where
b.Id is null

union all

select
t.Id
, t.Department
, t.Name
, rpt.Tasks + ', ' + t.Task
from
RecursivePersonTask rpt
inner join dbo.Task t
on rpt.Department = t.Department
and rpt.Name = t.Name
and rpt.Id = t.Id - 1
)
, CombinedPersonTasks( Id, Department, Name, Tasks )
as
(
select
ROW_NUMBER() over ( order by a.Id )
, a.Department
, a.Name
, '(' + a.Tasks + ')'
from
RecursivePersonTask a
left outer join RecursivePersonTask b
on a.Department = b.Department
and a.Name = b.Name
and a.Id = b.Id - 1
where
b.Id is null
)
, RecursiveDepartmentTasks( Id, Department, Tasks )
as
(
select
a.Id
, a.Department
, a.Name + ' ' + a.Tasks
from
CombinedPersonTasks a
left outer join CombinedPersonTasks b
on a.Department = b.Department
and a.Id = b.Id + 1
where
b.Id is null

union all

select
cpt.Id
, cpt.Department
, rdt.Tasks + ' ' + cpt.Name + ' ' + cpt.Tasks
from
RecursiveDepartmentTasks rdt
inner join CombinedPersonTasks cpt
on rdt.Department = cpt.Department
and rdt.Id = cpt.Id - 1
)
, CombinedDepartmentTasks( Id, Department, Tasks )
as
(
select
ROW_NUMBER() over ( order by a.Id )
, a.Department
, a.Tasks
from
RecursiveDepartmentTasks a
left outer join RecursiveDepartmentTasks b
on a.Department = b.Department
and a.Id = b.Id - 1
where
b.Id is null
)

select
*
from
CombinedDepartmentTasks
order by
Id

Collapse character vector into single string with each string on its own row

There is no way to create an object that stores data in the way that you want except when you parse the \n characters. Whitespace in strings are meaningful, i.e. spaces will be preserved when a string is printed. However, how those whitespaces appear in your console will vary for a number of reasons. As it is now, your string already has the characters to tell other functions that will process it to print each substring on its own line.

text <- c("This should be first row",
"This should be second row",
"This should be third row")

# This is what you did
# Notice that you can't assign to an object and it prints to console!
test <- cat(paste(text, collapse = "\n"))
#> This should be first row
#> This should be second row
#> This should be third row

print(test)
#> NULL

# This is what you want: sending the object itself
test2 <- paste(text, collapse = "\n")

print(test2)
#> [1] "This should be first row\nThis should be second row\nThis should be third row"

Collapse multiple rows into a single row based upon a break condition

Note that your test data does not really exercise the algo that well - e.g. you only have one employee, one plan. Also, as you described it, you would end up with 4 rows as there is a change of todate between 7->8, 8->9, 9->10 and 10->11.

But I can see what you are trying to do, so this should at least get you on the right track, and returns the expected 3 rows. I have taken the end of a group to be where either employee/plan/amount has changed, or where todate is not null (or where we reach the end of the data)

CREATE TABLE #data
(
RowID INT,
EmployeeID INT,
AllowancePlan VARCHAR(30),
FromDate DATE,
ToDate DATE,
AllowanceAmount DECIMAL(12,2)
);

INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);

-- find where the break points are
WITH chg AS
(
SELECT *,
CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan
OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
THEN 1 ELSE 0 END AS NewGroup
FROM #data
),
-- count the number of break points as we go to group the related rows
grp AS
(
SELECT chg.*,
ISNULL(
SUM(NewGroup)
OVER (ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
0) AS grpNum
FROM chg
)
SELECT MIN(grp.RowID) AS RowID,
MAX(grp.EmployeeID) AS EmployeeID,
MAX(grp.AllowancePlan) AS AllowancePlan,
MIN(grp.FromDate) AS FromDate,
MAX(grp.ToDate) AS ToDate,
MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum

How do I collapse multiple rows that shares one identical value into a single row PRESTO SQL

Depending on required logic you can use group by id with corresponding aggregate function, for example max (or sum), which will ignore null values.

-- sample data
WITH dataset (id, cnt1, cnt2) AS (
VALUES (7775, 1, null),
(7775, null, 2)
)

-- query
select id,
max(cnt1) cnt1,
max(cnt2) cnt2
from dataset
group by id

Output:



Leave a reply



Submit