How to Group by and Concatenate Fields in Redshift

How to GROUP BY and CONCATENATE fields in redshift

Well, I am a little late but the announcement about this feature happened on 3rd Aug 2015.
Redshift has introduced LISTAGG window function that makes it possible to do so now.
Here is a quick solution to your problem - may or may not be useful but putting it here so that people will know!

SELECT COMPANY_ID,
LISTAGG(EMPLOYEE,', ')
WITHIN GROUP (ORDER BY EMPLOYEE)
OVER (PARTITION BY COMPANY_ID) AS EMPLOYEE
FROM YOUR_TABLE
ORDER BY COMPANY_ID

I was happy to see this feature, and many of our production scripts are up for upgrade with all the new features Redshift keeps adding.

Here is the documentation about the function

SQL (Redshift) concat multiple rows under one ID

Use string aggregation function listagg():

select product_id, listagg(product_detail_value) product_details
from mytable
group by product_id

Or if you want to see the results as comma-separated 'name: value' pairs, then:

select 
product_id,
listagg(product_detail_name || ': ' || product_detail_value) product_details
from mytable
group by product_id

listagg() also supports an order by clause (with the within group syntax), that is described in the documentation.

How to combine rows in Amazon Redshift

Redshift provides a function LISTAGG() for what you need

SELECT id, name, LISTAGG(Color,' ') AS Colors
FROM yourtable
GROUP BY id, name

For each group in a query, the LISTAGG aggregate function orders the
rows for that group according to the ORDER BY expression, then
concatenates the values into a single string.
http://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html

SELECT id, name
, LISTAGG(Color,' ') WITHIN GROUP (ORDER BY name) AS Colors
FROM yourtable
GROUP BY id, name

Redshift - Merging two columns

You may use the || concatenation operator as @Mureinik has mentioned. But, we can also use the CONCAT function here:

SELECT
customer_name,
product_name,
CONCAT(CONCAT(customer_name, ' - '), product_name) AS output
FROM yourTable;

My guess as to why CONCAT wasn't working for you is that you were trying to pass more than 2 parameters to it. CONCAT in Redshift only takes two parameters, so we must chain them here to make it work.

How to Concatenate 2 Columns using SQL in DBeaver connected to Redshift

Redshift calls the function listagg():

SELECT LISTAGG(animal, ',') WITHIN GROUP (ORDER BY animal) AS animalList
FROM Animals
GROUP BY account_no, season;

This is not unreasonable, because this is the standard name for the function.

How can i group rows on sql base on condition

This is a type of gaps-and-islands problem. Because the dates are arbitrary, let me suggest the following approach:

  • Use a cumulative max to get the maximum end_date before the current date.
  • Use logic to determine when there is no overall (i.e. a new period starts).
  • A cumulative sum of the starts provides an identifier for the group.
  • Then aggregate.

As SQL:

select id, min(start_date), max(end_date)
from (select u.*,
sum(case when prev_end_date >= start_date then 0 else 1
end) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and current row
) as grp
from (select u.*,
max(end_date) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and 1 preceding
) as prev_end_date
from users u
) u
) u
group by id, grp;

Redshift SQL to comma separate a field with GROUP

SELECT t.CUST_ID, c.orders
FROM Table t JOIN
(SELECT cust_id, LISTAGG("ORDER"::text, ', ')
WITHIN GROUP (ORDER BY "ORDER") as orders
FROM table t
GROUP BY cust_id
) c
ON t.cust_id = c.cust_id
ORDER BY CUST_ID;

Redshift SQL to comma separate a field with group of two fields

I don't think Redshift supports listagg() as a window function. So, you can join in the result after a separate aggregation:

SELECT t.CUST_ID, c.orders
FROM Table t JOIN
(SELECT cust_id, LISTAGG("ORDER"::text, ', ')
WITHIN GROUP (ORDER BY "ORDER") as orders
FROM table t
GROUP BY cust_id
) c
ON t.cust_id = c.cust_id
ORDER BY CUST_ID;

Of course, I see no reason to replicate the data on each row. An aggregation query is probably sufficient, so there is only one row per customer in the result set.



Related Topics



Leave a reply



Submit