Postgresql 9.1: How to Concatenate Rows in Array Without Duplicates, Join Another Table

PostgreSQL 9.1: How to concatenate rows in array without duplicates, JOIN another table

Instead of using window functions and partitioning, use a query-level GROUP BY and aggregate with a DISTINCT clause:

SELECT         
rnp.grp_id,
array_to_string(array_agg(distinct rnp.cabinets),',') AS cabinets,
array_to_string(array_agg(distinct ips.address),',') AS addresses
FROM rnp JOIN ips ON rnp.grp_id=ips.grp_id GROUP BY rnp.grp_id, ips.grp_id;

Result:

 grp_id |        cabinets         | addresses 
--------+-------------------------+-----------
11 | cabs1,cabs2,cabs3,cabs4 | CA,NY
22 | c1,c2 | DC,LA
(2 rows)

The key here is that instead of using window functions and patitioning, you use a query-level GROUP BY and aggregate with a DISTINCT clause.

This'd work with the window function approach too, except that PostgreSQL (9.1 at least) doesn't support DISTINCT in window functions:

regress=# SELECT DISTINCT
rnp.grp_id,
array_to_string(array_agg(distinct rnp.cabinets)OVER (PARTITION BY rnp.grp_id), ',') AS cabinets,
array_to_string(array_agg(distinct ips.address) OVER (PARTITION BY ips.grp_id), ',') AS addresses
FROM rnp JOIN ips ON rnp.grp_id=ips.grp_id;
ERROR: DISTINCT is not implemented for window functions
LINE 3: array_to_string(array_agg(distinct rnp.cabinets)OVER (PART...

Concatenate multiple result rows of one column into one, group by another column

Simpler with the aggregate function string_agg() (Postgres 9.0 or later):

SELECT movie, string_agg(actor, ', ') AS actor_list
FROM tbl
GROUP BY 1;

The 1 in GROUP BY 1 is a positional reference and a shortcut for GROUP BY movie in this case.

string_agg() expects data type text as input. Other types need to be cast explicitly (actor::text) - unless an implicit cast to text is defined - which is the case for all other string types (varchar, character, name, ...) and some other types.

As isapir commented, you can add an ORDER BY clause in the aggregate call to get a sorted list - should you need that. Like:

SELECT movie, string_agg(actor, ', ' ORDER BY actor) AS actor_list
FROM tbl
GROUP BY 1;

But it's typically faster to sort rows in a subquery. See:

  • Create array in SELECT

Combining two rows from a table into one

try this :

  select max(bad_id),
split_part(string_agg(a,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)
,split_part(string_agg(b,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)
from foo group by real_id

If a and b are timestamp :

    select max(bad_id),
split_part(string_agg(a::character varying,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)::timestamp,
split_part(string_agg(b::character varying,'__SPLITER__' order by bad_id DESC),'__SPLITER__',1)::timestamp
from foo group by real_id

Same for integer : split_part(string_agg(a::character varying ...,1)::integer

How to remove duplicated rows in Postgresql?

Try this:

delete from your_table where id in 
(select max(id) from your_table
group by org_id, phone_number
having count(1) > 1);

Here's a working sample at DB-Fiddle. It's working properly.

Concatenate rows into a text string

Use String_agg function

SELECT row1,
row2,
string_agg(row3, ',') as row3
FROM your_table
GROUP BY row1,
row2

Union of two arrays in PostgreSQL without unnesting

If your problem is to unnest twice this will unnest only once

select array_agg(a order by a)
from (
select distinct unnest(array[1,2,3] || array[2,3,4,5]) as a
) s;

Get array of records based on two keys in same table

This should achieve your output. The trick sticks within conditional group by clause to handle cases where secondary_id and tertiary_id are the same for a record which has a matching record on both of those fields.

select array_agg(distinct t1) 
from table1 t1
join table1 t2 on
t1.secondary_id = t2.secondary_id
or t1.tertiary_id = t2.tertiary_id
group by
case
when t1.secondary_id is null or t1.secondary_id is null
then concat(t1.secondary_id,'#',t1.tertiary_id) -- #1
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.secondary_id = t2.secondary_id
then t1.secondary_id::TEXT -- #2
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.tertiary_id = t2.tertiary_id
then t1.tertiary_id::TEXT -- #3
end
order by 1

Standard case is when any of the fields are null, which stands for #1. We need to group by both columns and we're tricking it by concatenating both values from columns with a # mark and doing a group by this concatenated column.

For #2 and #3 we need to cast the grouping value to type text to make it go through (types returned by CASE statement need to be the same).

Option #2 serves the case when both values are not null and secondary_id matches between those "chosen" rows from selfjoin. Option #3 is analogical, but for tertiary_id match.

Output:

                                                 array_agg
------------------------------------------------------------------------------------------------------------
{"(1,1,2,,data1_1_2_N,data2_1_2_N)","(2,2,2,,data1_2_2_N,data2_2_2_N)"}
{"(3,3,3,5,data1_3_3_5,data2_3_3_5)","(4,4,3,5,data1_4_3_5,data2_4_3_5)"}
{"(5,5,,1,data1_5_N_1,data2_5_N_1)","(6,6,,1,data1_6_N_1,data2_6_N_1)","(7,7,,1,data1_7_N_1,data2_7_N_1)"}
{"(8,8,,2,data1_8_N_2,data2_8_N_2)","(9,9,,2,data1_9_N_2,data2_9_N_2)"}
{"(10,10,,3,data1_10_N_3,data2_10_N_3)"}
{"(11,11,4,4,data1_11_4_4,data2_11_4_4)","(12,12,4,11,data1_12_4_11,data2_12_4_11)"}

If you'd like to get rid of column id from your record, you could use a CTE and select all columns but id and then refer to that CTE in from clause.

Concatenate/merge array values during grouping/aggregation

Custom aggregate

Approach 1: define a custom aggregate. Here's one I wrote earlier.

CREATE TABLE my_test(title text, tags text[]);

INSERT INTO my_test(title, tags) VALUES
('ridealong', '{comedy,other}'),
('ridealong', '{comedy,tragedy}'),
('freddyjason', '{horror,silliness}');

CREATE AGGREGATE array_cat_agg(anyarray) (
SFUNC=array_cat,
STYPE=anyarray
);

select title, array_cat_agg(tags) from my_test group by title;

LATERAL query

... or since you don't want to preserve order and want to deduplicate, you could use a LATERAL query like:

SELECT title, array_agg(DISTINCT tag ORDER BY tag) 
FROM my_test, unnest(tags) tag
GROUP BY title;

in which case you don't need the custom aggregate. This one is probably a fair bit slower for big data sets due to the deduplication. Removing the ORDER BY if not required may help, though.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I found this (MySQL joining tables group by sum issue) and created a query like this

select * 
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10

I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10



Related Topics



Leave a reply



Submit