Concatenate Multiple Result Rows of One Column into One, Group by Another Column

Concatenate multiple result rows of one column into one, group by another column

Simpler with the aggregate function string_agg() (Postgres 9.0 or later):

SELECT movie, string_agg(actor, ', ') AS actor_list
FROM   tbl
GROUP  BY 1;

The 1 in GROUP BY 1 is a positional reference and a shortcut for GROUP BY movie in this case.

string_agg() expects data type text as input. Other types need to be cast explicitly (actor::text) - unless an implicit cast to text is defined - which is the case for all other string types (varchar, character, name, ...) and some other types.

As isapir commented, you can add an ORDER BY clause in the aggregate call to get a sorted list - should you need that. Like:

SELECT movie, string_agg(actor, ', ' ORDER BY actor) AS actor_list
FROM   tbl
GROUP  BY 1;

But it's typically faster to sort rows in a subquery. See:

Create array in SELECT

Can I concatenate multiple MySQL rows into one field?

You can use GROUP_CONCAT:

SELECT person_id,
   GROUP_CONCAT(hobbies SEPARATOR ', ')
FROM peoples_hobbies
GROUP BY person_id;

As Ludwig stated in his comment, you can add the DISTINCT operator to avoid duplicates:

SELECT person_id,
   GROUP_CONCAT(DISTINCT hobbies SEPARATOR ', ')
FROM peoples_hobbies
GROUP BY person_id;

As Jan stated in their comment, you can also sort the values before imploding it using ORDER BY:

SELECT person_id, 
       GROUP_CONCAT(hobbies ORDER BY hobbies ASC SEPARATOR ', ')
FROM peoples_hobbies
GROUP BY person_id;

As Dag stated in his comment, there is a 1024 byte limit on the result. To solve this, run this query before your query:

SET group_concat_max_len = 2048;

Of course, you can change 2048 according to your needs. To calculate and assign the value:

SET group_concat_max_len = CAST(
                     (SELECT SUM(LENGTH(hobbies)) + COUNT(*) * LENGTH(', ')
                           FROM peoples_hobbies
                           GROUP BY person_id) AS UNSIGNED);

How to concatenate text from multiple rows into a single text string in SQL Server

If you are on SQL Server 2017 or Azure, see Mathieu Renda answer.

I had a similar issue when I was trying to join two tables with one-to-many relationships. In SQL 2005 I found that XML PATH method can handle the concatenation of the rows very easily.

If there is a table called STUDENTS

SubjectID       StudentName
----------      -------------
1               Mary
1               John
1               Sam
2               Alaina
2               Edward

Result I expected was:

SubjectID       StudentName
----------      -------------
1               Mary, John, Sam
2               Alaina, Edward

I used the following T-SQL:

SELECT Main.SubjectID,
       LEFT(Main.Students,Len(Main.Students)-1) As "Students"
FROM
    (
        SELECT DISTINCT ST2.SubjectID, 
            (
                SELECT ST1.StudentName + ',' AS [text()]
                FROM dbo.Students ST1
                WHERE ST1.SubjectID = ST2.SubjectID
                ORDER BY ST1.SubjectID
                FOR XML PATH (''), TYPE
            ).value('text()[1]','nvarchar(max)') [Students]
        FROM dbo.Students ST2
    ) [Main]

You can do the same thing in a more compact way if you can concat the commas at the beginning and use substring to skip the first one so you don't need to do a sub-query:

SELECT DISTINCT ST2.SubjectID, 
    SUBSTRING(
        (
            SELECT ','+ST1.StudentName  AS [text()]
            FROM dbo.Students ST1
            WHERE ST1.SubjectID = ST2.SubjectID
            ORDER BY ST1.SubjectID
            FOR XML PATH (''), TYPE
        ).value('text()[1]','nvarchar(max)'), 2, 1000) [Students]
FROM dbo.Students ST2

SQL Group Multiple Results from Single Table

Building on the answer from the previous link you can create a cte then execute the query

This will given you the

SELECT Main.UniqueID,
       LEFT(Main.ItemTypes,Len(Main.ItemTypes)-1) As "ItemTypes"
FROM
    (
        SELECT DISTINCT ST2.UniqueID, 
            (
                SELECT ST1.ItemType + ',' AS [text()]
                FROM dbo.TheTable ST1
                WHERE ST1.UniqueID = ST2.UniqueID
                ORDER BY ST1.UniqueID
                FOR XML PATH (''), TYPE
            ).value('text()[1]','nvarchar(max)') ItemTypes
        FROM dbo.TheTable ST2
    ) [Main]

Once you have that you can build this into a cte with the with statement then join back on the table to get the rest of the data.

with ItemTypes as
(
SELECT Main.UniqueID,
       LEFT(Main.ItemTypes,Len(Main.ItemTypes)-1) As "ItemTypes"
FROM
    (
        SELECT DISTINCT ST2.UniqueID, 
            (
                SELECT ST1.ItemType + ',' AS [text()]
                FROM dbo.TheTable ST1
                WHERE ST1.UniqueID = ST2.UniqueID
                ORDER BY ST1.UniqueID
                FOR XML PATH (''), TYPE
            ).value('text()[1]','nvarchar(max)') ItemTypes
        FROM dbo.TheTable ST2
    ) [Main]
) 

Select Distinct TheTable.UniqueID, ItemTypes.ItemTypes, TheTable.OtherData
from TheTable join ItemTypes 
    on (TheTable.UniqueID = ItemTypes.UniqueID)

Results

UniqueID  ItemTypes                  OtherData
--------- -------------------------- --------------------------------
1234      apples,oranges,grapes      OtherData
2233      red fish,blue fish         OtherData

There are a few expensive operations this will be an expensive query to run. but with 2million rows should be ok with a good server.

Group multiple columns by a single column in pandas and concatenate the rows of each column being grouped

Use nested list comprehension in GroupBy.agg with filtered columns names in list:

files_list=["Col1", "Col2", "Col3"]
f = lambda x: [z for y in x for z in y]
df_1 = df_1.groupby('ID', sort=False, as_index=False)[files_list].agg(f)

If performance is not important or small DataFrame is possible use sum for join lists:

files_list=["Col1", "Col2", "Col3"]
df_1 = df_1.groupby('ID', sort=False, as_index=False)[files_list].agg(sum)
print (df_1)
  ID          Col1          Col2          Col3
0  S       [A, A1]       [R, R1]      [y, ii1]
1  T           [B]           [S]            []
2  L  [B2, C2, D1]  [R2, Q2, T1]  [m2, i2, p1]

How to merge multiple rows removing duplicates and concatenate other column values

You need to aggregate using collect_set - it removes duplicates. array_join adds delimiter between elements. concat concatenates the result with either ; or empty string at the end.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [(123, '001', 'ABC', 'DEF'),
     (123, '001', 'ABC', 'DEG'),
     (256, '002', 'XXY', 'DSA'),
     (256, '002', 'XXX', 'DSA')],
    ['A', 'B', 'C', 'D']
)

df_agg = df.groupBy('A', 'B').agg(
    *[F.concat(F.array_join(F.collect_set(c), ';'),
             F.when(F.size(F.collect_set(c)) > 1, ';').otherwise('')).alias(c)
     for c in {'C', 'D'}]
)
df_agg.show()
# +---+---+--------+--------+
# |  A|  B|       C|       D|
# +---+---+--------+--------+
# |123|001|     ABC|DEF;DEG;|
# |256|002|XXX;XXY;|     DSA|
# +---+---+--------+--------+

SQL Query to concatenate column values from multiple rows in Oracle

There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG:

SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;

Then join to A to pick out the pids you want.

Note: Out of the box, LISTAGG only works correctly with VARCHAR2 columns.

Concatenate Multiple Result Rows of One Column into One, Group by Another Column