Multiple Array_Agg() Calls in a Single Query

Multiple array_agg() calls in a single query

DISTINCT is often applied to repair queries that are rotten from the inside, and that's often expensive and / or incorrect. Don't multiply rows to begin with, then you don't have to fold unwanted duplicates at the end.

Joining to multiple n-tables ("has many") multiplies rows in the result set. That's efectively a CROSS JOIN or Cartesian product by proxy. See:

  • Two SQL LEFT JOINS produce incorrect result

There are various ways to avoid this mistake.

Aggregate first, join later

Technically, the query works as long as you join to one table with multiple rows at a time before you aggregate:

SELECT e.id, e.name, e.age, e.streets, array_agg(wd.day) AS days
FROM (
SELECT e.id, e.name, e.age, array_agg(ad.street) AS streets
FROM employees e
JOIN address ad ON ad.employeeid = e.id
GROUP BY e.id -- PK covers whole row
) e
JOIN workingdays wd ON wd.employeeid = e.id
GROUP BY e.id, e.name, e.age;

It's best to include the primary key id and GROUP BY it, because name and age are not necessarily unique. Else you might merge employees by mistake.

But better aggregate in a subquery before the join, that's superior without selective WHERE conditions on employees:

SELECT e.id, e.name, e.age, ad.streets, array_agg(wd.day) AS days
FROM employees e
JOIN (
SELECT employeeid, array_agg(ad.street) AS streets
FROM address
GROUP BY 1
) ad ON ad.employeeid = e.id
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY e.id, ad.streets;

Or aggregate both:

SELECT name, age, ad.streets, wd.days
FROM employees e
JOIN (
SELECT employeeid, array_agg(ad.street) AS streets
FROM address
GROUP BY 1
) ad ON ad.employeeid = e.id
JOIN (
SELECT employeeid, array_agg(wd.day) AS days
FROM workingdays
GROUP BY 1
) wd ON wd.employeeid = e.id;

The last one is typically faster if you retrieve all or most of the rows in the base tables.

Note that using JOIN and not LEFT JOIN removes employees from the result that have no row in address or none in workingdays. That may or may not be intended. Switch to LEFT JOIN to retain all employees in the result.

Correlated subqueries / JOIN LATERAL

For selective filters on employees, consider correlated subqueries instead:

SELECT name, age
, (SELECT array_agg(street) FROM address WHERE employeeid = e.id) AS streets
, (SELECT array_agg(day) FROM workingdays WHERE employeeid = e.id) AS days
FROM employees e
WHERE e.namer = 'peter'; -- very selective

Or LATERAL joins in Postgres 9.3 or later:

SELECT e.name, e.age, a.streets, w.days
FROM employees e
LEFT JOIN LATERAL (
SELECT array_agg(street) AS streets
FROM address
WHERE employeeid = e.id
GROUP BY 1
) a ON true
LEFT JOIN LATERAL (
SELECT array_agg(day) AS days
FROM workingdays
WHERE employeeid = e.id
GROUP BY 1
) w ON true
WHERE e.name = 'peter'; -- very selective
  • What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

The last two queries retain all qualifying employees in the result.

Merging arrays from array_agg into a single array

Basically just:

SELECT v.video_id, v.video_name, array_agg(l.label_name) AS shot_labels
FROM videos v
JOIN shots s USING (video_id)
JOIN label_occurrences lo USING (shot_id)
JOIN labels l USING (label_id)
GROUP BY 1;

The simplified GROUP BY assumes that video.video_id is the PRIMARY KEY of its table, so video_name is also covered.

Get all array_agg() values with one matched value Postgres

demo:db<>fiddle

SELECT
id,
ARRAY_AGG(session_os)
FROM
t
GROUP BY id
HAVING ARRAY_AGG(session_os) && ARRAY['Android']

Your problem is that you FIRST filter the records with session_os = Android. And afterwards you are aggregate these.

You have to aggregate first and then have a look into the array aggregate if Android is an element. This can be done using the HAVING clause and the && operator which returns true if two arrays contain the same element(s).

How to get flat aggregation of two calls to json_agg

I didn't have to json_agg inside the inner-most JOIN, instead I can call array_agg(i.images) at the same point array_agg(name) AS options is called, to get a flat list of images:

SELECT p.name, o.options, o.images
FROM products p
LEFT JOIN (
SELECT "productId", array_agg(DISTINCT name) AS options, array_agg(i.images) AS images
FROM options o
LEFT JOIN (
SELECT "optionId", i."fileName" AS images
FROM images i
) i ON i."optionId" = o.id
GROUP BY 1
) o ON o."productId" = p.id;

name | options | images
-------------+-------------------------------+------------------------------
Shampoo | {"Frizzy Hair","Hair Growth"} | {bee.png,fancy.png,soap.png}
Conditioner |

How to group multiple columns into a single array or similar?

(t.id, t.tag_name, t.tag_color) is short syntax for ROW(t.id, t.tag_name, t.tag_color) - and a ROW constructor does not preserve nested attribute names. The manual:

By default, the value created by a ROW expression is of an anonymous record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.

Bold emphasis mine. To also get proper key names in the result, cast to a registered composite type as advised in the quote, use a nested subselect, or simply use json_build_object() in Postgres 9.4 or newer (effectively avoiding the ROW constructor a priori):

SELECT trm.target_record_id
, json_agg(json_build_object('id', t.id
, 'tag_name', t.tag_name
, 'tag_color', t.tag_color)) AS tags

FROM tags_record_maps trm
JOIN tags t USING (site_id)
WHERE t.id = trm.tag_id
GROUP BY trm.target_record_id
HAVING count(*) > 1;

I use original column names, but you can chose your key names freely. In your case:

       json_agg(json_build_object('id', t.id
, 'name', t.tag_name
, 'color', t.tag_color)) AS tags

Detailed explanation:

  • Return multiple columns of the same row as JSON array of objects

Have multiple aggregations in a query always the same order?

The order is never deterministic if you don't provide an order by

So if you need a specific order, then specify it:

SELECT ARRAY_AGG(columnA order by some_sort_column), 
ARRAY_AGG(columnB order by some_sort_column)
FROM myTable
GROUP BY columnC

One-to-Many SQL SELECT concatenated into single row

You are almost there - you just need aggregation:

SELECT
o.id,
o.status,
STRING_AGG(c.text, ',') comments
FROM "Order" o
LEFT JOIN "Comment" c ON p.id = c."order"
GROUP BY o.id, o.status

I would strongly recommend against having a table (and/or a column) called order: because it conflicts with a language keyword. I would also recommend avoiding quoted identifiers as much as possible - they make the queries longer to write, for no benefit.

Note that you can also use a correlated subquery:

SELECT
o.id,
o.status,
(SELECT STRING_AGG(c.text, ',') FROM "Comment" c WHERE c."order" = p.id) comments
FROM "Order" o

How to aggregate values in two columns in multiple records into one

You can unpivot and aggregate:

select firstname, lastname, string_agg(pt, ', ') as points
from (select t.*, v.pt,
row_number() over (partition by firstname, lastname, pt order by pt) as seqnum
from t cross apply
(values (t.startpoint), (t.endpoint)) as v(pt)
) t
where seqnum = 1
group by firstname, lastname;

Unfortunately, string_agg() doesn't support distinct. However, this is easily remedied by using row_number().

Edit:

If you wanted to identify each separate connected component, then you can use a recursive CTE:

with cte as (
select id, firstname, lastname,
convert(varchar(max), concat(startpoint, ', ', endpoint)) as points,
endpoint
from t
where not exists (select 1 from t t2 where t2.endpoint = t.startpoint)
union all
select cte.id, cte.firstname, cte.lastname,
concat(cte.point, ', ', cte.endpoint), t.endpoint
from cte join
t
on t.startpoint = cte.endpoint and t.id = cte.id
)
select *
from cte;

Here is a db<>fiddle.



Related Topics



Leave a reply



Submit