PostgreSQL JOIN with array type with array elements order, how to implement?
SELECT t.*
FROM unnest(ARRAY[1,2,3,2,3,5]) item_id
LEFT JOIN items t on t.id=item_id
The above query select items from items
table with ids: 1,2,3,2,3,5 in that order.
PostgreSQL JOIN with array type with array elements
Your tags
should have a INTEGER[]
type.
CREATE TABLE users(
id SERIAL,
fname VARCHAR(50),
tags INTEGER[]
);
Then,
SELECT I.id, I.fname, array_agg(J.name)
FROM users I
LEFT JOIN tags J
ON J.id = ANY(I.tags)
GROUP BY fname,I.id ORDER BY id
should work. See sqlfiddle
This question may help.
Postgres - join on array values
not sure if this is still helpful for anyone, but unnesting the tags is quite a bit slower than letting postgres do the work directly from the array. you can rewrite the query and this is generally more performant because the g.id = ANY(tags)
is a simple pkey index scan without the expansion step:
SELECT t.id, t.name, ARRAY_AGG(ARRAY[g.name, g.description])
FROM my_table AS t
LEFT JOIN tags AS g
ON g.id = ANY(tags)
GROUP BY t.id;
SELECT statement using JOIN table with array data type in PostgreSQL
PostgreSQL supports array operators you can use to resolve the identifiers to their respective order item:
SELECT
u.id,
u.name,
CASE WHEN max(o.item) IS NULL THEN NULL ELSE array_agg(o.item) END AS items
FROM tbl_users AS u
LEFT JOIN tbl_orders AS o ON (u.items @> array[o.id])
GROUP BY 1, 2;
Online demo: https://rextester.com/CZDC2260
...would return:
id | name | items
----+------+-------
1 | john | {A,B}
2 | kate | {A}
3 | sue |
4 | ted |
(4 rows)
Unnesting array with in Join PostgreSQL
To get each item as a row, you need to unnest your array (essentially normalizing the model on the fly):
select b.itemid, a.orderid, b.name
from table_a a
cross join unnest(a.items) as i(item)
join table_b b on i.item = b.itemid
order by b.itemid;
Online example
Postgres join on json array values
I suggest to use an ARRAY constructor in a LATERAL
subquery:
SELECT p.id, j.users
FROM post p
CROSS JOIN LATERAL (
SELECT ARRAY(
SELECT u.name
FROM jsonb_array_elements_text(p.attributes->'user_ids') WITH ORDINALITY j(user_id, ord)
JOIN users u ON u.id = j.user_id::int
ORDER BY j.ord
) AS users
) j
;
db<>fiddle here
Note that null
values in the array are ignored.
The CROSS JOIN
never eliminates rows in this case because the ARRAY constructor always returns a row, even when the JSON array is empty or NULL.
Related:
- How to show all dates from a certain date range in horizontal row?
- What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
- PostgreSQL unnest() with element number
- Why is array_agg() slower than the non-aggregate ARRAY() constructor?
- How to turn JSON array into Postgres array?
Dynamically cast type of array elements to match some expression type in PostgreSQL query
Better query
I want to enhance Ruby on Rails
in_order_of
feature which is currently implemented via unreadableCASE
statement:
For starters, neither the awkward CASE
construct nor array_position()
are ideal solutions.
SELECT id, title, type
FROM posts
ORDER BY
array_position(ARRAY['SuperSpecial','Special','Ordinary']::varchar[], type),
published_at DESC;
There is a superior solution in Postgres:
SELECT id, title, type
FROM posts
LEFT JOIN unnest(ARRAY['SuperSpecial','Special','Ordinary']::varchar[]) WITH ORDINALITY o(type, ord) USING (type)
ORDER BY o.ord, published_at DESC;
This avoids calling the function array_position()
for every row and is cheaper.
Equivalent short syntax with array literal and implicit column name:
SELECT id, title, type
FROM posts
LEFT JOIN unnest('{SuperSpecial,Special,Ordinary}'::varchar[]) WITH ORDINALITY type USING (type)
ORDER BY ordinality, published_at DESC;
db<>fiddle here
Added "benefit": it works with type-mismatch in Postgres 13 - as long as array type and column type are compatible.
The only possible caveat I can think of: If the passed array has duplicate elements, joined rows are duplicated accordingly. That wouldn't happen with array_position()
. But duplicates would be nonsense for the expressed purpose in any case. Make sure to pass unique array elements.
See:
- ORDER BY the IN value list
- PostgreSQL unnest() with element number
Improved functionality in Postgres 14
The error you report is going away with Postgres 14:
ERROR: function array_position(text[], character varying) does not exist
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Quoting the release notes:
Allow some array functions to operate on a mix of compatible data
types (Tom Lane)The functions
array_append()
,array_prepend()
,array_cat()
,array_position()
,array_positions()
,array_remove()
,array_replace()
, andwidth_bucket()
now takeanycompatiblearray
instead ofanyarray
arguments. This makes them less fussy about
exact matches of argument types.
And the manual on anycompatiblearray
:
Indicates that a function accepts any array data type, with automatic promotion of multiple arguments to a common data type
So, while this raises the above error msg in Postgres 13:
SELECT array_position(ARRAY['a','b','c']::text[], 'd'::varchar);
.. the same just works in Postgres 14.
(Your query and error msg show flipped positions for text
and varchar
, but all the same.)
To be clear, calls with compatible types now just work, incompatible types still raise an exception:
SELECT array_position('{1,2,3}'::text[], 3);
(The numeric literal 3
defaults to type integer
, which is incompatible with text
.)
Answer to actual question
.. which may be irrelevant by now. But as proof of concept:
CREATE OR REPLACE FUNCTION posts_order_by(_order anyarray)
RETURNS SETOF posts
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format (
$$
SELECT p.*
FROM posts p
LEFT JOIN unnest($1) WITH ORDINALITY o(type, ord) ON (o.type::%s = p.type)
ORDER BY o.ord, published_at DESC
$$
, (SELECT atttypid::regtype
FROM pg_attribute
WHERE attrelid = 'posts'::regclass
AND attname = 'type')
)
USING _order;
END
$func$;
db<>fiddle here
Doesn't make a whole lot of sense, as the type of posts.id
should be well-known at the time of writing the function, but there may be special cases ...
Now both of these calls work:
SELECT * FROM posts_order_by('{SuperSpecial,Special,Ordinary}'::varchar[]);
SELECT * FROM posts_order_by('{1,2,3}'::int[]);
Though the second typically doesn't make sense.
Related, with links to more:
- Executing queries dynamically in PL/pgSQL
How to respect the order of an array in a PostgreSQL select sentence
For long arrays you typically get (much!) more efficient query plans with unnesting the array and joining to the main table. In simple cases, this even preserves the original order of the array without adding ORDER BY
. Rows are processed in order. But there are no guarantees and the order may be broken with more joins or with parallel execution etc. To be sure, add WITH ORDINALITY
:
CREATE OR REPLACE FUNCTION list_products (tid int[]) -- VARIADIC?
RETURNS TABLE (
id integer,
reference varchar,
price decimal(13,4)
)
LANGUAGE sql STABLE AS
$func$
SELECT product_id, p.reference, p.price
FROM unnest(tid) WITH ORDINALITY AS t(product_id, ord)
JOIN product p USING (product_id) -- LEFT JOIN ?
ORDER BY t.ord
$func$;
Fast, simple, safe. See:
- PostgreSQL unnest() with element number
- Join against the output of an array unnest without creating a temp table
You might want to throw in the modifier VARIADIC
, so you can call the function with an array or a list of IDs (max 100 items by default). See:
- Return rows matching elements of input array in plpgsql function
- Call a function with composite type as argument from native query in jpa
- Function to select column values by comparing against comma separated list
I would declare STABLE
function volatility.
You might use LEFT JOIN
instead of JOIN
to make sure that all given IDs are returned - with NULL values if a row with given ID has gone missing.
db<>fiddle here
Note a subtle logic difference with duplicates in the array. While product_id
is UNIQUE
...
- unnest + left join returns exactly one row for every given ID - preserving duplicates in the given IDs if any.
product_id = any (tid)
folds duplicates. (One of the reasons it typically results in more expensive query plans.)
If there are no dupes in the given array, there is no difference. If there can be duplicates and you want to fold them, your task is ambiguous, as it's undefined which position to keep.
Preserve the order of items in array when doing JOIN in Postgres
Use unnest() WITH ORDINALITY
and the ordinality in an aggregate:
SELECT p.id AS pid, array_agg(e.code ORDER BY ordinality) AS code
FROM posts p, unnest(p.embeds) WITH ORDINALITY AS u(embed_id, ordinality)
JOIN embeds e ON e.id = embed_id
GROUP BY p.id;
Read in the documentation:
If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1.
How to match elements in an array of composite type?
This works:
SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT (unnest(elements)).*
FROM collection
WHERE id = 1);
Or more verbose, but preferable:
SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT (e).*
FROM collection c, unnest(c.elements) e
WHERE c.id = 1);
More robust and avoids evaluating unnest()
multiple times. See:
- JOIN on set returning function results
This works, too:
SELECT *
FROM element
WHERE ROW((pk1, pk2, pk3)) IN (SELECT unnest(elements)
FROM collection
WHERE id = 1);
The core of the problem is that IN
taking a subquery knows two separate forms. Quoting the manual:
expression IN (subquery)
row_constructor IN (subquery)
Your failing query resolves to the second form, while you (understandably) expect the first. But the second form does this:
The left-hand side of this form of
IN
is a row constructor, as
described in Section 4.2.13. The right-hand side is a
parenthesized subquery, which must return exactly as many columns as
there are expressions in the left-hand row. The left-hand expressions
are evaluated and compared row-wise to each row of the subquery
result. [...]
My first and second query make it work by decomposing the row type to the right of the operator. So Postgres has three bigint
values left and right and is satisfied.
My third query makes it work by nesting the row type to the left in another row constructor. Postgres only decomposes the first level and ends up with a single composite type - matching the single composite type to the right.
Note that the keyword ROW
is required for the single field we are wrapping. The manual:
The key word
ROW
is optional when there is more than one expression in the list.
Your working query is subtly different as it provides a list of values to the right instead of a subquery (set). That's a different implementation taking a different code path. It even gets a separate chapter in the manual. This variant has no special treatment for a ROW constructor to the left. So it just works as expected (by you).
More equivalent (working) syntax variants with = ANY
:
SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY ('{"(1,2,3)","(2,3,4)"}'::element_pk_t[]);
SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3)::element_pk_t,(2,3,4)::element_pk_t]);
SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3),(2,3,4)]::element[]);
Also valid with (pk1, pk2, pk3)::element_pk_t
or ROW(pk1, pk2, pk3)::element_pk_t
See:
- IN vs ANY operator in PostgreSQL
Since your source is an array, Daniel's second query with (e.pk1, e.pk2, e.pk3) = ANY(c.elements)
lends itself naturally.
But for a bet on the fastest query, my money is on my second variant, because I expect it to use the PK index optimally.
Just as proof of concept. Like a_horse commented: a normalized DB design will probably scale best.
Related Topics
Row-Level Trigger VS Statement-Level Trigger
Get Count of Records Affected by Insert or Update in Postgresql
Using Like in an Oracle in Clause
How to Implement SQL Intersect and Minus Operations in Ms Access
Scope_Identity VS Ident_Current
Cannot Bulk Load. the File "C:\Data.Txt" Does Not Exist
The Alter Table Statement Conflicted with the Foreign Key Constraint
Is This a Good Way to Model Address Information in a Relational Database
Designing SQL Database to Represent Oo Class Hierarchy
How to Find Out What Is Locking My Tables
Using Bind Variables with Dynamic Select into Clause in Pl/Sql
How to Select the First N Rows of Each Group
How to Use a Dynamic Parameter in a in Clause of a JPA Named Query
Converting String List into Int List in SQL
SQL Insert Without Specifying Columns. What Happens