Concatenate multiple result rows of one column into one, group by another column
Simpler with the aggregate function string_agg()
(Postgres 9.0 or later):
SELECT movie, string_agg(actor, ', ') AS actor_list
FROM tbl
GROUP BY 1;
The 1
in GROUP BY 1
is a positional reference and a shortcut for GROUP BY movie
in this case.
string_agg()
expects data type text
as input. Other types need to be cast explicitly (actor::text
) - unless an implicit cast to text
is defined - which is the case for all other string types (varchar
, character
, name
, ...) and some other types.
As isapir commented, you can add an ORDER BY
clause in the aggregate call to get a sorted list - should you need that. Like:
SELECT movie, string_agg(actor, ', ' ORDER BY actor) AS actor_list
FROM tbl
GROUP BY 1;
But it's typically faster to sort rows in a subquery. See:
- Create array in SELECT
Can I concatenate multiple MySQL rows into one field?
You can use GROUP_CONCAT
:
SELECT person_id,
GROUP_CONCAT(hobbies SEPARATOR ', ')
FROM peoples_hobbies
GROUP BY person_id;
As Ludwig stated in his comment, you can add the DISTINCT
operator to avoid duplicates:
SELECT person_id,
GROUP_CONCAT(DISTINCT hobbies SEPARATOR ', ')
FROM peoples_hobbies
GROUP BY person_id;
As Jan stated in their comment, you can also sort the values before imploding it using ORDER BY
:
SELECT person_id,
GROUP_CONCAT(hobbies ORDER BY hobbies ASC SEPARATOR ', ')
FROM peoples_hobbies
GROUP BY person_id;
As Dag stated in his comment, there is a 1024 byte limit on the result. To solve this, run this query before your query:
SET group_concat_max_len = 2048;
Of course, you can change 2048
according to your needs. To calculate and assign the value:
SET group_concat_max_len = CAST(
(SELECT SUM(LENGTH(hobbies)) + COUNT(*) * LENGTH(', ')
FROM peoples_hobbies
GROUP BY person_id) AS UNSIGNED);
How to concatenate text from multiple rows into a single text string in SQL Server
If you are on SQL Server 2017 or Azure, see Mathieu Renda answer.
I had a similar issue when I was trying to join two tables with one-to-many relationships. In SQL 2005 I found that XML PATH
method can handle the concatenation of the rows very easily.
If there is a table called STUDENTS
SubjectID StudentName
---------- -------------
1 Mary
1 John
1 Sam
2 Alaina
2 Edward
Result I expected was:
SubjectID StudentName
---------- -------------
1 Mary, John, Sam
2 Alaina, Edward
I used the following T-SQL
:
SELECT Main.SubjectID,
LEFT(Main.Students,Len(Main.Students)-1) As "Students"
FROM
(
SELECT DISTINCT ST2.SubjectID,
(
SELECT ST1.StudentName + ',' AS [text()]
FROM dbo.Students ST1
WHERE ST1.SubjectID = ST2.SubjectID
ORDER BY ST1.SubjectID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)') [Students]
FROM dbo.Students ST2
) [Main]
You can do the same thing in a more compact way if you can concat the commas at the beginning and use substring
to skip the first one so you don't need to do a sub-query:
SELECT DISTINCT ST2.SubjectID,
SUBSTRING(
(
SELECT ','+ST1.StudentName AS [text()]
FROM dbo.Students ST1
WHERE ST1.SubjectID = ST2.SubjectID
ORDER BY ST1.SubjectID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)'), 2, 1000) [Students]
FROM dbo.Students ST2
SQL Group Multiple Results from Single Table
Building on the answer from the previous link you can create a cte then execute the query
This will given you the
SELECT Main.UniqueID,
LEFT(Main.ItemTypes,Len(Main.ItemTypes)-1) As "ItemTypes"
FROM
(
SELECT DISTINCT ST2.UniqueID,
(
SELECT ST1.ItemType + ',' AS [text()]
FROM dbo.TheTable ST1
WHERE ST1.UniqueID = ST2.UniqueID
ORDER BY ST1.UniqueID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)') ItemTypes
FROM dbo.TheTable ST2
) [Main]
Once you have that you can build this into a cte with the with statement then join back on the table to get the rest of the data.
with ItemTypes as
(
SELECT Main.UniqueID,
LEFT(Main.ItemTypes,Len(Main.ItemTypes)-1) As "ItemTypes"
FROM
(
SELECT DISTINCT ST2.UniqueID,
(
SELECT ST1.ItemType + ',' AS [text()]
FROM dbo.TheTable ST1
WHERE ST1.UniqueID = ST2.UniqueID
ORDER BY ST1.UniqueID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)') ItemTypes
FROM dbo.TheTable ST2
) [Main]
)
Select Distinct TheTable.UniqueID, ItemTypes.ItemTypes, TheTable.OtherData
from TheTable join ItemTypes
on (TheTable.UniqueID = ItemTypes.UniqueID)
Results
UniqueID ItemTypes OtherData
--------- -------------------------- --------------------------------
1234 apples,oranges,grapes OtherData
2233 red fish,blue fish OtherData
There are a few expensive operations this will be an expensive query to run. but with 2million rows should be ok with a good server.
Group multiple columns by a single column in pandas and concatenate the rows of each column being grouped
Use nested list comprehension in GroupBy.agg
with filtered columns names in list
:
files_list=["Col1", "Col2", "Col3"]
f = lambda x: [z for y in x for z in y]
df_1 = df_1.groupby('ID', sort=False, as_index=False)[files_list].agg(f)
If performance is not important or small DataFrame is possible use sum
for join lists:
files_list=["Col1", "Col2", "Col3"]
df_1 = df_1.groupby('ID', sort=False, as_index=False)[files_list].agg(sum)
print (df_1)
ID Col1 Col2 Col3
0 S [A, A1] [R, R1] [y, ii1]
1 T [B] [S] []
2 L [B2, C2, D1] [R2, Q2, T1] [m2, i2, p1]
How to merge multiple rows removing duplicates and concatenate other column values
You need to aggregate using collect_set
- it removes duplicates. array_join
adds delimiter between elements. concat
concatenates the result with either ;
or empty string at the end.
from pyspark.sql import functions as F
df = spark.createDataFrame(
[(123, '001', 'ABC', 'DEF'),
(123, '001', 'ABC', 'DEG'),
(256, '002', 'XXY', 'DSA'),
(256, '002', 'XXX', 'DSA')],
['A', 'B', 'C', 'D']
)
df_agg = df.groupBy('A', 'B').agg(
*[F.concat(F.array_join(F.collect_set(c), ';'),
F.when(F.size(F.collect_set(c)) > 1, ';').otherwise('')).alias(c)
for c in {'C', 'D'}]
)
df_agg.show()
# +---+---+--------+--------+
# | A| B| C| D|
# +---+---+--------+--------+
# |123|001| ABC|DEF;DEG;|
# |256|002|XXX;XXY;| DSA|
# +---+---+--------+--------+
SQL Query to concatenate column values from multiple rows in Oracle
There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG
:
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
Then join to A
to pick out the pids
you want.
Note: Out of the box, LISTAGG
only works correctly with VARCHAR2
columns.
Related Topics
SQL - How to Store and Navigate Hierarchies
Doing a Where .. in Subquery in Doctrine 2
Get Join Table as Array of Results With Postgresql/Nodejs
SQL Server - Transactions Roll Back on Error
How to Restore a Dump File from MySQLdump
SQL Query to Get Aggregated Result in Comma Separators Along With Group by Column in SQL Server
Is There Something Wrong With Joins That Don't Use the Join Keyword in SQL or MySQL
How to Make a Recursive SQL Query
Why "Extra Characters After Command" Error Shown for the Sed Command Line Shown
Unknown Column in 'Field List' Error on MySQL Update Query
Postgresql Group by Different from MySQL
Key Value Pairs in Relational Database
Real Life Example, When to Use Outer/Cross Apply in Sql
Group by and Aggregate Sequential Numeric Values