Parallel Unnest() and Sort Order in Postgresql

Parallel unnest() and sort order in PostgreSQL

Yes, that is a feature of Postgres and parallel unnesting is guaranteed to be in sync (as long as all arrays have the same number of elements).

Postgres 9.4 adds a clean solution for parallel unnest:

  • Unnest multiple arrays in parallel

The order of resulting rows is not guaranteed, though. Actually, with a statement as simple as:

SELECT unnest(ARRAY[5,3,9]) AS id;

the resulting order of rows is "guaranteed", but Postgres does not assert anything. The query optimizer is free to order rows as it sees fit as long as the order is not explicitly defined. This may have side effects in more complex queries.

If the second query in your question is what you actually want (add an index number to unnested array elements), there is a better way with generate_subscripts():

SELECT unnest(ARRAY[5,3,9]) AS id
, generate_subscripts(ARRAY[5,3,9], 1) AS idx
ORDER BY idx;

Details in this related answer:

  • How to access array internal index with postgreSQL?

You will be interested in WITH ORDINALITY in Postgres 9.4:

  • PostgreSQL unnest() with element number

Then you can use:

SELECT * FROM unnest(ARRAY[5,3,9]) WITH ORDINALITY tbl(id, idx);

Unnest multiple arrays in parallel

You will love this new feature of Postgres 9.4:

unnest(anyarray, anyarray [, ...])

unnest() with the much anticipated (at least by me) capability to unnest multiple arrays in parallel cleanly. The manual:

expand multiple arrays (possibly of different types) to a set of rows. This is only allowed in the FROM clause;

It's a special implementation of the new ROWS FROM feature.

Your function can now just be:

CREATE OR REPLACE FUNCTION multi_unnest(_some_id int
, _amounts numeric[]
, _invoices text[])
RETURNS TABLE (some_id int, amount numeric, invoice text) AS
$func$
SELECT _some_id, u.* FROM unnest(_amounts, _invoices) u;
$func$ LANGUAGE sql;

Call:

SELECT * FROM multi_unnest(123, '{100, 40.5, 76}'::numeric[] 
, '{01-2222-05,01-3333-04,01-4444-08}'::text[]);

Of course, the simple form can be replaced with plain SQL (no additional function):

SELECT 123 AS some_id, *
FROM unnest('{100, 40.5, 76}'::numeric[]
, '{01-2222-05,01-3333-04,01-4444-08}'::text[]) AS u(amount, invoice);

In earlier versions (Postgres 9.3-), you can use the less elegant and less safe form:

SELECT 123 AS some_id
, unnest('{100, 40.5, 76}'::numeric[]) AS amount
, unnest('{01-2222-05,01-3333-04,01-4444-08}'::text[]) AS invoice;

Caveats of the old shorthand form: besides being non-standard to have set-returning function in the SELECT list, the number of rows returned would be the lowest common multiple of each arrays number of elements (with surprising results for unequal numbers). Details in these related answers:

  • Parallel unnest() and sort order in PostgreSQL
  • Is there something like a zip() function in PostgreSQL that combines two arrays?

This behavior has finally been sanitized with Postgres 10. Multiple set-returning functions in the SELECT list produce rows in "lock-step" now. See:

  • What is the expected behaviour for multiple set-returning functions in SELECT clause?

Upsert with parallel unnest raises column does not exist in UPDATE part

You must use the special table EXCLUDED in the UPDATE part of an UPSERT.

CREATE FUNCTION public.sort_category(category_ids integer[], sort integer[])
RETURNS void
LANGUAGE sql VOLATILE STRICT AS
$func$
INSERT INTO upsert (user_id, category_id, sort)
SELECT s.user_id, input.category_id, input.sort_index
FROM unnest($1, $2) AS input(category_id, sort_index)
CROSS JOIN sessions s
WHERE s.session_token = 'a'
ON CONFLICT (user_id, category_id) DO UPDATE
SET sort = EXCLUDED.sort; -- here!
$func$

db<>fiddle here

See:

  • How to UPSERT multiple rows with individual values in one statement?

Pull in duplicates with query

I think you need a query like this.

 SELECT * FROM 
( SELECT unnest(coach_ids) as coach_id FROM
stations WHERE id = 1 ) s WHERE EXISTS ( select id FROM coaches c
where c.id = s.coach_id ) ;

DEMO

EDIT:

you said,

I seem to only get the array IDs come back though, no data.

You should use a JOIN in that case. For maintaining proper ORDER you could use generate_subscripts() along with UNNEST

SELECT c.* FROM 
( SELECT unnest(coach_ids) as coach_id
,generate_subscripts(coach_ids, 1) AS idx FROM
stations WHERE id = 1 ) s JOIN coaches c
ON c.id = s.coach_id
ORDER BY s.idx;

DEMO2

How to respect the order of an array in a PostgreSQL select sentence

For long arrays you typically get (much!) more efficient query plans with unnesting the array and joining to the main table. In simple cases, this even preserves the original order of the array without adding ORDER BY. Rows are processed in order. But there are no guarantees and the order may be broken with more joins or with parallel execution etc. To be sure, add WITH ORDINALITY:

CREATE OR REPLACE FUNCTION list_products (tid int[])  -- VARIADIC?
RETURNS TABLE (
id integer,
reference varchar,
price decimal(13,4)
)
LANGUAGE sql STABLE AS
$func$
SELECT product_id, p.reference, p.price
FROM unnest(tid) WITH ORDINALITY AS t(product_id, ord)
JOIN product p USING (product_id) -- LEFT JOIN ?
ORDER BY t.ord
$func$;

Fast, simple, safe. See:

  • PostgreSQL unnest() with element number
  • Join against the output of an array unnest without creating a temp table

You might want to throw in the modifier VARIADIC, so you can call the function with an array or a list of IDs (max 100 items by default). See:

  • Return rows matching elements of input array in plpgsql function
  • Call a function with composite type as argument from native query in jpa
  • Function to select column values by comparing against comma separated list

I would declare STABLE function volatility.

You might use LEFT JOIN instead of JOIN to make sure that all given IDs are returned - with NULL values if a row with given ID has gone missing.

db<>fiddle here

Note a subtle logic difference with duplicates in the array. While product_id is UNIQUE ...

  • unnest + left join returns exactly one row for every given ID - preserving duplicates in the given IDs if any.
  • product_id = any (tid) folds duplicates. (One of the reasons it typically results in more expensive query plans.)

If there are no dupes in the given array, there is no difference. If there can be duplicates and you want to fold them, your task is ambiguous, as it's undefined which position to keep.

Unnest array by one level

Function

To break out 1-dimensional arrays from n-dimensional arrays - representing leaves of the nested dimensions. (With n >= 1.)

PL/pgSQL

With a FOR loop looping through the array:

CREATE OR REPLACE FUNCTION unnest_nd_1d(a ANYARRAY, OUT a_1d ANYARRAY)
RETURNS SETOF ANYARRAY
LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
BEGIN
FOREACH a_1d SLICE 1 IN ARRAY a LOOP
RETURN NEXT;
END LOOP;
END
$func$;

SLICE 1 instructs to take the 1-dimensonal arrays. (SLICE 2 would take 2-dimensional arrays.)

PARALLEL SAFE only for Postgres 9.6 or later.

Later tests revealed this PL/pgSQL function to be fastest.

Related:

  • How to unnest a 2d array into a 1d array quickly in PostgreSQL?

Pure SQL

Only works for 2D arrays:

CREATE OR REPLACE FUNCTION unnest_2d_1d(anyarray)
RETURNS SETOF anyarray
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT array_agg($1[d1][d2])
FROM generate_subscripts($1,1) d1
, generate_subscripts($1,2) d2
GROUP BY d1
ORDER BY d1
$func$;

This is an improved and simplified version of the function Lukas posted.

db<>fiddle here

Old sqlfiddle

Explanation

SELECT (ARRAY[[1,2,3], [4,5,6], [7,8,9]])[0]

returns the same as:

SELECT (ARRAY[[1,2,3], [4,5,6], [7,8,9]])[17]

... which is NULL. The manual:

By default, the lower bound index value of an array's dimensions is
set to one.

0 has no special meaning as array subscript. There's just nothing there for Postgres arrays with default indexes.

Also, with two-dimensional arrays, you need two indexes to get a base element. Like:

SELECT (ARRAY[[1,2,3], [4,5,6], [7,8,9]])[1][2]

Result:

2

The first part of your message is a bit unclear.

SELECT array_dims(ARRAY[[1,2,3], [4,5,6], [7,8,9]]);

Result:

[1:3][1:3]

That's two dimensions with 3 elements (1 to 3) each (9 base elements).

If you want n-1 dimensions then this is a correct result:

SELECT ARRAY (SELECT unnest('{{1,2,3}, {4,5,6}, {7,8,9}}'::int[]))

Result:

{1,2,3,4,5,6,7,8,9}

That's one dimension. unnest() produces one base element per row (regardless of array dimensions). Your example is just another 2-dimensional array with a missing set of curly brackets ... ?

{1,2,3}, {4,5,6}, {7,8,9}

If you want a slice of the array:

SELECT (ARRAY[[1,2,3], [4,5,6], [7,8,9]])[1:2]

Result:

{{1,2,3},{4,5,6}}

Or:

SELECT (ARRAY[[1,2,3], [4,5,6], [7,8,9]])[2:2][1:2]

Result:

{{4,5}}

To flatten the result (get a 1D array):

  • How to select 1d array from 2d array?

Read the manual here.

For very old versions

For Postgres versions < 8.4, array_agg() is not installed by default. Create it first:

CREATE AGGREGATE array_agg(anyelement) (
SFUNC = array_append,
STYPE = anyarray,
INITCOND = '{}'
);

Also, generate_subscripts() is not born, yet. Use instead:

...
FROM generate_series(array_lower($1,1), array_upper($1,1)) d1
, generate_series(array_lower($1,2), array_upper($1,2)) d2
...

Call:

SELECT unnest_2d_1d(ARRAY[[1,2], [3,4], [5,6]]);

Result

{1,2}
{3,4}
{5,6}

What is the expected behaviour for multiple set-returning functions in SELECT clause?

Postgres 10 or newer

pads with null values for smaller set(s). Demo with generate_series():

SELECT generate_series( 1,  2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;

row2 | row3 | row4
-----+------+-----
1 | 11 | 21
2 | 12 | 22
null | 13 | 23
null | null | 24

dbfiddle here

The manual for Postgres 10:

If there is more than one set-returning function in the query's select
list, the behavior is similar to what you get from putting the
functions into a single LATERAL ROWS FROM( ... ) FROM-clause item. For
each row from the underlying query, there is an output row using the
first result from each function, then an output row using the second
result, and so on. If some of the set-returning functions produce
fewer outputs than others, null values are substituted for the missing
data, so that the total number of rows emitted for one underlying row
is the same as for the set-returning function that produced the most
outputs. Thus the set-returning functions run “in lockstep” until they
are all exhausted, and then execution continues with the next
underlying row.

This ends the traditionally odd behavior.

Some other details changed with this rewrite. The release notes:

  • Change the implementation of set-returning functions appearing in a query's SELECT list (Andres Freund)

    Set-returning functions are now evaluated before evaluation of scalar
    expressions in the SELECT list, much as though they had been placed
    in a LATERAL FROM-clause item. This allows saner semantics for cases
    where multiple set-returning functions are present. If they return
    different numbers of rows, the shorter results are extended to match
    the longest result by adding nulls. Previously the results were cycled
    until they all terminated at the same time, producing a number of rows
    equal to the least common multiple of the functions' periods. In
    addition, set-returning functions are now disallowed within CASE and
    COALESCE constructs.
    For more information see Section 37.4.8.

Bold emphasis mine.

Postgres 9.6 or older

The number of result rows (somewhat surprisingly!) is the lowest common multiple of all sets in the same SELECT list. (Only acts like a CROSS JOIN if there is no common divisor to all set-sizes!) Demo:

SELECT generate_series( 1,  2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;

row2 | row3 | row4
-----+------+-----
1 | 11 | 21
2 | 12 | 22
1 | 13 | 23
2 | 11 | 24
1 | 12 | 21
2 | 13 | 22
1 | 11 | 23
2 | 12 | 24
1 | 13 | 21
2 | 11 | 22
1 | 12 | 23
2 | 13 | 24

dbfiddle here

Documented in manual for Postgres 9.6 the chapter SQL Functions Returning Sets, along with the recommendation to avoid it:

Note: The key problem with using set-returning functions in the select
list, rather than the FROM clause, is that putting more than one
set-returning function in the same select list does not behave very
sensibly. (What you actually get if you do so is a number of output
rows equal to the least common multiple of the numbers of rows
produced by each set-returning function.
) The LATERAL syntax produces
less surprising results when calling multiple set-returning functions,
and should usually be used instead.

Bold emphasis mine.

A single set-returning function is OK (but still cleaner in the FROM list), but multiple in the same SELECT list is discouraged now. This was a useful feature before we had LATERAL joins. Now it's merely historical ballast.

Related:

  • Parallel unnest() and sort order in PostgreSQL
  • Unnest multiple arrays in parallel
  • What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?

Multiply elements in an array according to their position

WITH t AS (SELECT ARRAY[10,20,30,40,50]::INT[] AS arr)   -- variable for demo
SELECT ARRAY(
SELECT unnest(array_fill(arr[idx], ARRAY[idx])) AS mult
FROM (SELECT arr, generate_subscripts(arr, 1) AS idx FROM t) sub
);

I would wrap the logic into a simple IMMUTABLE SQL function:

CREATE OR REPLACE FUNCTION f_expand_arr(_arr anyarray)
RETURNS anyarray AS
$func$
SELECT ARRAY(
SELECT unnest(array_fill(_arr[idx], ARRAY[idx]))
FROM (SELECT generate_subscripts(_arr, 1) AS idx) sub
)
$func$ LANGUAGE sql IMMUTABLE;

Works for arrays of any base type due to the polymorphic parameter type anyarray:

How to write a function that returns text or integer values?

The manual on generate_subscripts() and array_fill().

Note: This works with the actual array indexes, which can differ from the ordinal array position in Postgres. You may be interested in @Daniel's method to "normalize" the array index:

Normalize array subscripts for 1-dimensional array so they start with 1

The upcoming Postgres 9.4 (currently beta) provides WITH ORDINALITY:

PostgreSQL unnest() with element number

Allowing for this even more elegant and reliable solution:

CREATE OR REPLACE FUNCTION f_expand_arr(_arr anyarray)
RETURNS anyarray AS
$func$
SELECT ARRAY(
SELECT unnest(array_fill(a, ARRAY[idx]))
FROM unnest(_arr) WITH ORDINALITY AS x (a, idx)
)
$func$ LANGUAGE sql IMMUTABLE;

One might still argue that proper order is not actually guaranteed. I claim it is ...

Parallel unnest() and sort order in PostgreSQL

Call:

SELECT f_expand_arr(ARRAY[10,20,30,40,10]::INT[]) AS a2;

Or for values from a table:

SELECT f_expand_arr(a) AS a2 FROM t;

SQL Fiddle.



Related Topics



Leave a reply



Submit