Set Limit to Array_Agg()

Set limit to array_agg()

select id[1], id[2]
from (
SELECT array_agg("Esns".id ) as id
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
) s

or if you want the output as array you can slice it:

SELECT (array_agg("Esns".id ))[1:2] as id_array
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2

Adding LIMIT to ARRAY_TO_JSON or ARRAY_AGG

Your function should work like this:

CREATE OR REPLACE FUNCTION words_get_user_chat(in_uid integer)
RETURNS jsonb AS
LANGUAGE sql STABLE
$func$
SELECT COALESCE(jsonb_object_agg(gid, y), '{}')
FROM (
SELECT gid, jsonb_agg((SELECT j FROM (SELECT created, uid, msg) j)) AS y
FROM (
SELECT DISTINCT gid -- DISTINCT may be redundant
FROM words_games
WHERE (finished IS NULL
OR finished > (CURRENT_TIMESTAMP - INTERVAL '1 day'))
AND in_uid IN (player1, player2)
) g
CROSS JOIN LATERAL (
SELECT EXTRACT(EPOCH FROM created)::int AS created
, uid
, msg
FROM words_chat c
WHERE c.gid = g.gid
ORDER BY c.created DESC
LIMIT 10 -- HERE !!
) c
GROUP BY 1
) x
$func$;

Do not aggregate all rows, just to discard the surplus later. Would be a waste. Place the LIMIT after ORDER BY in a subquery.

You need to identify qualifying gid from words_games first for this and then use a LATERAL join to a subquery on words_chat. Should be correct and faster, too.

Since c.created is defined NOT NULL, you don't need to add NULLS LAST in the ORDER BY clause. This matching multicolumn index should yield best read performance:

CREATE INDEX ON words_chat(gid, created DESC);

And maybe some index on words_games. Depends on cardinalities and value frequencies.

While being at it, I also streamlined construction the jsonb result.

Related:

  • Return multiple columns of the same row as JSON array of objects
  • Query with LEFT JOIN not returning rows for count of 0
  • Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail

limitation of array_agg in Postgresql 9.1

the issue here was not Postgresql, but rather the client I was using.
pgAdmin III doesn't display the content of array over a certain size. about 4.5k.
when using psql one doesn't encounter the same issue.

the the UI of pgAdmin there is an option to set "Max characters per column" and it was set to 256 in my case, which makes little sense.
but you copy & paste the array that looks empty into notepad you'll find the all the data is there.

Limit length of array_agg in Athena Presto

You can actually use slice to do it:

SELECT 
key
, slice(array_agg(value),1,100) as values
FROM table

Please note that the array index starts at 1

LIMIT within ARRAY_AGG in BigQuery

The issue was the placement of the LIMIT clause. It was in scope for the SELECT statement, rather than the ARRAY_AGG function. This corrected it:

SELECT
x,
ARRAY_AGG((
SELECT
AS STRUCT y
) LIMIT 1) y
FROM
`a`,
UNNEST(b) b
WHERE
x = 'abc'
GROUP BY
1
LIMIT 1

PostgreSQL - string_agg with limited number of elements

I am not aware that you can limit it in the string_agg() function. You can limit it in other ways:

select postid, string_agg(distinct(tag), ', ')
from table t
group by postid

Then you can do:

select postid, string_agg(distinct (case when seqnum <= 10 then tag end), ', ')
from (select t.*, dense_rank() over (partition by postid order by tag) as seqnum
from table t
) t
group by postid

Open, high, low, close aggregation in BigQuery

Obvious improvement is to use simple MIN and MAX for min_value and max_value

select date(l) d,
array_agg(v order by l asc limit 1)[offset(0)] first_value,
array_agg(v order by l desc limit 1)[offset(0)] last_value,
max(v) max_value,
min(v) min_value
from t
group by d

Rather than this, using array_agg is a good practice here and using [offset(0)] is important here as without it - your outputs will be arrays with one elements - but you most likely want the element itself out

One more - depends on the volume of your data - you can try below approach which uses analytic aggregation functions vs. just aggregation functions

select distinct * from (
select date(l) d,
first_value(v) over(partition by date(l) order by l asc) first_value,
first_value(v) over(partition by date(l) order by l desc) last_value,
max(v) over(partition by date(l)) max_value,
min(v) over(partition by date(l)) min_value
from t
)

More options to consider - using approximate aggregate functions as in below example

select extract(date from l) d,
approx_top_sum(v, 1 / unix_seconds(l), 1)[offset(0)].value first_value,
approx_top_sum(v, unix_seconds(l), 1)[offset(0)].value last_value,
max(v) max_value,
min(v) min_value,
from t
group by d

Is it wasteful to use ARRAY_AGG to get the first non-NULL value in a column?

Yes, it's wasteful. I expect this to be faster:

SELECT DISTINCT ON (medic_id)
medic_id
, first_value(first_name) OVER (PARTITION BY medic_id ORDER BY CASE WHEN first_name IS NOT NULL THEN id END) AS first_name
, first_value(last_name) OVER (PARTITION BY medic_id ORDER BY CASE WHEN last_name IS NOT NULL THEN id END) AS last_name
, first_value(age) OVER (PARTITION BY medic_id ORDER BY CASE WHEN age IS NOT NULL THEN id END) AS age
FROM medic_edits;

For descending id value, use instead:

       first_value(first_name) OVER (PARTITION BY medic_id ORDER BY CASE WHEN first_name IS NOT NULL THEN id END DESC NULLS LAST) AS first_name

See:

  • Sort by column ASC, but NULL values first?

But there are probably faster ways, yet. Also depends on the exact table definition, cardinalities, and data distribution.

See:

  • Fetch a row that contains the set of last non-NULL values for each column

About DISTINCT ON:

  • Select first row in each GROUP BY group?

Works in a single SELECT because DISTINCT or DISTINCT ON are applied after window functions. See:

  • Best way to get result count before LIMIT was applied

Aside: "age" is going to bit-rot rapidly. It's typically superior to store a birthday.



Related Topics



Leave a reply



Submit