PostgreSQL 9.1: How to concatenate rows in array without duplicates, JOIN another table
Instead of using window functions and partitioning, use a query-level GROUP BY and aggregate with a DISTINCT clause:
SELECT
rnp.grp_id,
array_to_string(array_agg(distinct rnp.cabinets),',') AS cabinets,
array_to_string(array_agg(distinct ips.address),',') AS addresses
FROM rnp JOIN ips ON rnp.grp_id=ips.grp_id GROUP BY rnp.grp_id, ips.grp_id;
Result:
grp_id | cabinets | addresses
--------+-------------------------+-----------
11 | cabs1,cabs2,cabs3,cabs4 | CA,NY
22 | c1,c2 | DC,LA
(2 rows)
The key here is that instead of using window functions and patitioning, you use a query-level GROUP BY
and aggregate with a DISTINCT
clause.
This'd work with the window function approach too, except that PostgreSQL (9.1 at least) doesn't support DISTINCT
in window functions:
regress=# SELECT DISTINCT
rnp.grp_id,
array_to_string(array_agg(distinct rnp.cabinets)OVER (PARTITION BY rnp.grp_id), ',') AS cabinets,
array_to_string(array_agg(distinct ips.address) OVER (PARTITION BY ips.grp_id), ',') AS addresses
FROM rnp JOIN ips ON rnp.grp_id=ips.grp_id;
ERROR: DISTINCT is not implemented for window functions
LINE 3: array_to_string(array_agg(distinct rnp.cabinets)OVER (PART...
Concatenate multiple result rows of one column into one, group by another column
Simpler with the aggregate function string_agg()
(Postgres 9.0 or later):
SELECT movie, string_agg(actor, ', ') AS actor_list
FROM tbl
GROUP BY 1;
The 1
in GROUP BY 1
is a positional reference and a shortcut for GROUP BY movie
in this case.
string_agg()
expects data type text
as input. Other types need to be cast explicitly (actor::text
) - unless an implicit cast to text
is defined - which is the case for all other string types (varchar
, character
, name
, ...) and some other types.
As isapir commented, you can add an ORDER BY
clause in the aggregate call to get a sorted list - should you need that. Like:
SELECT movie, string_agg(actor, ', ' ORDER BY actor) AS actor_list
FROM tbl
GROUP BY 1;
But it's typically faster to sort rows in a subquery. See:
- Create array in SELECT
Combining two rows from a table into one
try this :
select max(bad_id),
split_part(string_agg(a,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)
,split_part(string_agg(b,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)
from foo group by real_id
If a and b are timestamp :
select max(bad_id),
split_part(string_agg(a::character varying,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)::timestamp,
split_part(string_agg(b::character varying,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)::timestamp
from foo group by real_id
Same for integer : split_part(string_agg(a::character varying ...,1)::integer
How to remove duplicated rows in Postgresql?
Try this:
delete from your_table where id in
(select max(id) from your_table
group by org_id, phone_number
having count(1) > 1);
Here's a working sample at DB-Fiddle. It's working properly.
Concatenate rows into a text string
Use String_agg
function
SELECT row1,
row2,
string_agg(row3, ',') as row3
FROM your_table
GROUP BY row1,
row2
Union of two arrays in PostgreSQL without unnesting
If your problem is to unnest twice this will unnest only once
select array_agg(a order by a)
from (
select distinct unnest(array[1,2,3] || array[2,3,4,5]) as a
) s;
Get array of records based on two keys in same table
This should achieve your output. The trick sticks within conditional group by clause to handle cases where secondary_id
and tertiary_id
are the same for a record which has a matching record on both of those fields.
select array_agg(distinct t1)
from table1 t1
join table1 t2 on
t1.secondary_id = t2.secondary_id
or t1.tertiary_id = t2.tertiary_id
group by
case
when t1.secondary_id is null or t1.secondary_id is null
then concat(t1.secondary_id,'#',t1.tertiary_id) -- #1
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.secondary_id = t2.secondary_id
then t1.secondary_id::TEXT -- #2
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.tertiary_id = t2.tertiary_id
then t1.tertiary_id::TEXT -- #3
end
order by 1
Standard case is when any of the fields are null, which stands for #1. We need to group by both columns and we're tricking it by concatenating both values from columns with a #
mark and doing a group by this concatenated column.
For #2 and #3 we need to cast the grouping value to type text
to make it go through (types returned by CASE statement need to be the same).
Option #2 serves the case when both values are not null and secondary_id
matches between those "chosen" rows from selfjoin. Option #3 is analogical, but for tertiary_id
match.
Output:
array_agg
------------------------------------------------------------------------------------------------------------
{"(1,1,2,,data1_1_2_N,data2_1_2_N)","(2,2,2,,data1_2_2_N,data2_2_2_N)"}
{"(3,3,3,5,data1_3_3_5,data2_3_3_5)","(4,4,3,5,data1_4_3_5,data2_4_3_5)"}
{"(5,5,,1,data1_5_N_1,data2_5_N_1)","(6,6,,1,data1_6_N_1,data2_6_N_1)","(7,7,,1,data1_7_N_1,data2_7_N_1)"}
{"(8,8,,2,data1_8_N_2,data2_8_N_2)","(9,9,,2,data1_9_N_2,data2_9_N_2)"}
{"(10,10,,3,data1_10_N_3,data2_10_N_3)"}
{"(11,11,4,4,data1_11_4_4,data2_11_4_4)","(12,12,4,11,data1_12_4_11,data2_12_4_11)"}
If you'd like to get rid of column id
from your record, you could use a CTE and select all columns but id and then refer to that CTE in from clause.
Concatenate/merge array values during grouping/aggregation
Custom aggregate
Approach 1: define a custom aggregate. Here's one I wrote earlier.
CREATE TABLE my_test(title text, tags text[]);
INSERT INTO my_test(title, tags) VALUES
('ridealong', '{comedy,other}'),
('ridealong', '{comedy,tragedy}'),
('freddyjason', '{horror,silliness}');
CREATE AGGREGATE array_cat_agg(anyarray) (
SFUNC=array_cat,
STYPE=anyarray
);
select title, array_cat_agg(tags) from my_test group by title;
LATERAL query
... or since you don't want to preserve order and want to deduplicate, you could use a LATERAL
query like:
SELECT title, array_agg(DISTINCT tag ORDER BY tag)
FROM my_test, unnest(tags) tag
GROUP BY title;
in which case you don't need the custom aggregate. This one is probably a fair bit slower for big data sets due to the deduplication. Removing the ORDER BY
if not required may help, though.
How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Related Topics
Postgresql Alter Column Data Type to Timestamp Without Time Zone
SQL Server Query for Rank (Rownumber) and Groupings
Seed Data with Old Dates in Temporal Table - SQL Server
SQL Select 'N' Records Without a Table
Check If Stored Procedure Is Running
How to Set Server Output on in Datagrip
Cakephp See the Compiled SQL Query Before Execution
How to Escape a String for Use with the Like Operator in SQL Server
Sql: Select a List of Numbers from "Nothing"
Store Multiple Elements in JSON Files in Aws Athena
SQL Server Filestream Limitation
Inline Blob/Binary Data Types in SQL/Jdbc
MySQL Nested Sets - How to Find Parent of Node
Teradata SQL Pivot Multiple Occurrences into Additional Columns
Join on Set Returning Function Results
Bigquery SQL for Sliding Window Aggregate