Postgresql Join with Array Type with Array Elements Order, How to Implement

PostgreSQL JOIN with array type with array elements order, how to implement?

SELECT t.*
FROM unnest(ARRAY[1,2,3,2,3,5]) item_id
LEFT JOIN items t on t.id=item_id

The above query select items from items table with ids: 1,2,3,2,3,5 in that order.

PostgreSQL JOIN with array type with array elements

Your tags should have a INTEGER[] type.

CREATE TABLE users(
id SERIAL,
fname VARCHAR(50),
tags INTEGER[]
);

Then,

SELECT I.id, I.fname, array_agg(J.name) 
FROM users I
LEFT JOIN tags J
ON J.id = ANY(I.tags)
GROUP BY fname,I.id ORDER BY id

should work. See sqlfiddle

This question may help.

Postgres - join on array values

not sure if this is still helpful for anyone, but unnesting the tags is quite a bit slower than letting postgres do the work directly from the array. you can rewrite the query and this is generally more performant because the g.id = ANY(tags) is a simple pkey index scan without the expansion step:

SELECT t.id, t.name, ARRAY_AGG(ARRAY[g.name, g.description])
FROM my_table AS t
LEFT JOIN tags AS g
ON g.id = ANY(tags)
GROUP BY t.id;

SELECT statement using JOIN table with array data type in PostgreSQL

PostgreSQL supports array operators you can use to resolve the identifiers to their respective order item:

SELECT
u.id,
u.name,
CASE WHEN max(o.item) IS NULL THEN NULL ELSE array_agg(o.item) END AS items
FROM tbl_users AS u
LEFT JOIN tbl_orders AS o ON (u.items @> array[o.id])
GROUP BY 1, 2;

Online demo: https://rextester.com/CZDC2260

...would return:

 id | name | items 
----+------+-------
1 | john | {A,B}
2 | kate | {A}
3 | sue |
4 | ted |
(4 rows)

Unnesting array with in Join PostgreSQL

To get each item as a row, you need to unnest your array (essentially normalizing the model on the fly):

select b.itemid, a.orderid, b.name
from table_a a
cross join unnest(a.items) as i(item)
join table_b b on i.item = b.itemid
order by b.itemid;

Online example

Postgres join on json array values

I suggest to use an ARRAY constructor in a LATERAL subquery:

SELECT p.id, j.users
FROM post p
CROSS JOIN LATERAL (
SELECT ARRAY(
SELECT u.name
FROM jsonb_array_elements_text(p.attributes->'user_ids') WITH ORDINALITY j(user_id, ord)
JOIN users u ON u.id = j.user_id::int
ORDER BY j.ord
) AS users
) j
;

db<>fiddle here

Note that null values in the array are ignored.

The CROSS JOIN never eliminates rows in this case because the ARRAY constructor always returns a row, even when the JSON array is empty or NULL.

Related:

  • How to show all dates from a certain date range in horizontal row?
  • What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
  • PostgreSQL unnest() with element number
  • Why is array_agg() slower than the non-aggregate ARRAY() constructor?
  • How to turn JSON array into Postgres array?

Dynamically cast type of array elements to match some expression type in PostgreSQL query

Better query

I want to enhance Ruby on Rails in_order_of feature which is currently implemented via unreadable CASE statement:

For starters, neither the awkward CASE construct nor array_position() are ideal solutions.

SELECT id, title, type
FROM posts
ORDER BY
array_position(ARRAY['SuperSpecial','Special','Ordinary']::varchar[], type),
published_at DESC;

There is a superior solution in Postgres:

SELECT id, title, type
FROM posts
LEFT JOIN unnest(ARRAY['SuperSpecial','Special','Ordinary']::varchar[]) WITH ORDINALITY o(type, ord) USING (type)
ORDER BY o.ord, published_at DESC;

This avoids calling the function array_position() for every row and is cheaper.

Equivalent short syntax with array literal and implicit column name:

SELECT id, title, type
FROM posts
LEFT JOIN unnest('{SuperSpecial,Special,Ordinary}'::varchar[]) WITH ORDINALITY type USING (type)
ORDER BY ordinality, published_at DESC;

db<>fiddle here

Added "benefit": it works with type-mismatch in Postgres 13 - as long as array type and column type are compatible.

The only possible caveat I can think of: If the passed array has duplicate elements, joined rows are duplicated accordingly. That wouldn't happen with array_position(). But duplicates would be nonsense for the expressed purpose in any case. Make sure to pass unique array elements.

See:

  • ORDER BY the IN value list
  • PostgreSQL unnest() with element number

Improved functionality in Postgres 14

The error you report is going away with Postgres 14:

ERROR: function array_position(text[], character varying) does not exist                                                   
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Quoting the release notes:

  • Allow some array functions to operate on a mix of compatible data
    types (Tom Lane)

    The functions array_append(), array_prepend(), array_cat(),
    array_position(), array_positions(), array_remove(),
    array_replace(), and width_bucket() now take anycompatiblearray
    instead of anyarray arguments. This makes them less fussy about
    exact matches of argument types.

And the manual on anycompatiblearray:

Indicates that a function accepts any array data type, with automatic promotion of multiple arguments to a common data type

So, while this raises the above error msg in Postgres 13:

SELECT array_position(ARRAY['a','b','c']::text[], 'd'::varchar);

.. the same just works in Postgres 14.

(Your query and error msg show flipped positions for text and varchar, but all the same.)

To be clear, calls with compatible types now just work, incompatible types still raise an exception:

SELECT array_position('{1,2,3}'::text[], 3);

(The numeric literal 3 defaults to type integer, which is incompatible with text.)

Answer to actual question

.. which may be irrelevant by now. But as proof of concept:

CREATE OR REPLACE FUNCTION posts_order_by(_order anyarray)
RETURNS SETOF posts
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format (
$$
SELECT p.*
FROM posts p
LEFT JOIN unnest($1) WITH ORDINALITY o(type, ord) ON (o.type::%s = p.type)
ORDER BY o.ord, published_at DESC
$$
, (SELECT atttypid::regtype
FROM pg_attribute
WHERE attrelid = 'posts'::regclass
AND attname = 'type')
)
USING _order;
END
$func$;

db<>fiddle here

Doesn't make a whole lot of sense, as the type of posts.id should be well-known at the time of writing the function, but there may be special cases ...

Now both of these calls work:

SELECT * FROM posts_order_by('{SuperSpecial,Special,Ordinary}'::varchar[]);    
SELECT * FROM posts_order_by('{1,2,3}'::int[]);

Though the second typically doesn't make sense.

Related, with links to more:

  • Executing queries dynamically in PL/pgSQL

How to respect the order of an array in a PostgreSQL select sentence

For long arrays you typically get (much!) more efficient query plans with unnesting the array and joining to the main table. In simple cases, this even preserves the original order of the array without adding ORDER BY. Rows are processed in order. But there are no guarantees and the order may be broken with more joins or with parallel execution etc. To be sure, add WITH ORDINALITY:

CREATE OR REPLACE FUNCTION list_products (tid int[])  -- VARIADIC?
RETURNS TABLE (
id integer,
reference varchar,
price decimal(13,4)
)
LANGUAGE sql STABLE AS
$func$
SELECT product_id, p.reference, p.price
FROM unnest(tid) WITH ORDINALITY AS t(product_id, ord)
JOIN product p USING (product_id) -- LEFT JOIN ?
ORDER BY t.ord
$func$;

Fast, simple, safe. See:

  • PostgreSQL unnest() with element number
  • Join against the output of an array unnest without creating a temp table

You might want to throw in the modifier VARIADIC, so you can call the function with an array or a list of IDs (max 100 items by default). See:

  • Return rows matching elements of input array in plpgsql function
  • Call a function with composite type as argument from native query in jpa
  • Function to select column values by comparing against comma separated list

I would declare STABLE function volatility.

You might use LEFT JOIN instead of JOIN to make sure that all given IDs are returned - with NULL values if a row with given ID has gone missing.

db<>fiddle here

Note a subtle logic difference with duplicates in the array. While product_id is UNIQUE ...

  • unnest + left join returns exactly one row for every given ID - preserving duplicates in the given IDs if any.
  • product_id = any (tid) folds duplicates. (One of the reasons it typically results in more expensive query plans.)

If there are no dupes in the given array, there is no difference. If there can be duplicates and you want to fold them, your task is ambiguous, as it's undefined which position to keep.

Preserve the order of items in array when doing JOIN in Postgres

Use unnest() WITH ORDINALITY and the ordinality in an aggregate:

SELECT p.id AS pid, array_agg(e.code ORDER BY ordinality) AS code
FROM posts p, unnest(p.embeds) WITH ORDINALITY AS u(embed_id, ordinality)
JOIN embeds e ON e.id = embed_id
GROUP BY p.id;

Read in the documentation:

If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1.

How to match elements in an array of composite type?

This works:

SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT (unnest(elements)).*
FROM collection
WHERE id = 1);

Or more verbose, but preferable:

SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT (e).*
FROM collection c, unnest(c.elements) e
WHERE c.id = 1);

More robust and avoids evaluating unnest() multiple times. See:

  • JOIN on set returning function results

This works, too:

SELECT *
FROM element
WHERE ROW((pk1, pk2, pk3)) IN (SELECT unnest(elements)
FROM collection
WHERE id = 1);

The core of the problem is that IN taking a subquery knows two separate forms. Quoting the manual:

expression IN (subquery)

row_constructor IN (subquery)

Your failing query resolves to the second form, while you (understandably) expect the first. But the second form does this:

The left-hand side of this form of IN is a row constructor, as
described in Section 4.2.13. The right-hand side is a
parenthesized subquery, which must return exactly as many columns as
there are expressions in the left-hand row. The left-hand expressions
are evaluated and compared row-wise to each row of the subquery
result. [...]

My first and second query make it work by decomposing the row type to the right of the operator. So Postgres has three bigint values left and right and is satisfied.

My third query makes it work by nesting the row type to the left in another row constructor. Postgres only decomposes the first level and ends up with a single composite type - matching the single composite type to the right.

Note that the keyword ROW is required for the single field we are wrapping. The manual:

The key word ROW is optional when there is more than one expression in the list.


Your working query is subtly different as it provides a list of values to the right instead of a subquery (set). That's a different implementation taking a different code path. It even gets a separate chapter in the manual. This variant has no special treatment for a ROW constructor to the left. So it just works as expected (by you).

More equivalent (working) syntax variants with = ANY:

SELECT * FROM element 
WHERE (pk1, pk2, pk3) = ANY ('{"(1,2,3)","(2,3,4)"}'::element_pk_t[]);

SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3)::element_pk_t,(2,3,4)::element_pk_t]);

SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3),(2,3,4)]::element[]);

Also valid with (pk1, pk2, pk3)::element_pk_t or ROW(pk1, pk2, pk3)::element_pk_t

See:

  • IN vs ANY operator in PostgreSQL

Since your source is an array, Daniel's second query with (e.pk1, e.pk2, e.pk3) = ANY(c.elements) lends itself naturally.

But for a bet on the fastest query, my money is on my second variant, because I expect it to use the PK index optimally.

Just as proof of concept. Like a_horse commented: a normalized DB design will probably scale best.



Related Topics



Leave a reply



Submit