Join on Set Returning Function Results

JOIN on set returning function results

In Postgres 9.1:

SELECT name, (f).*  -- note the parentheses!
FROM (SELECT name, calculate_payments(id) AS f FROM person) sub;

Assuming that your function has a well-defined return type with column names (id, action, amount). And that your function always returns the same id it is fed (which is redundant and might be optimized).

The same in much more verbose form:

SELECT sub.id, sub.name, (sub.f).action, (sub.f).amount  -- parentheses!
FROM (
SELECT p.id, p.name, calculate_payments(p.id) AS f(id, action, amount)
FROM person p
) sub;

Set-returning functions in the SELECT list result in multiple rows. But that's a non-standard and somewhat quirky feature. The new LATERAL feature in pg 9.3+ is preferable.

You could decompose the row type in the same step:

SELECT *, (calculate_payments(p.id)).*  -- parentheses!
FROM person p

But due to a weakness in the Postgres query planner, this would evaluate the function once per result column:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

Or in your case:

SELECT p.id, p.name
, (calculate_payments(p.id)).action
, (calculate_payments(p.id)).amount
FROM person p

Same problem: repeated evaluation.

To be precise, the equivalent of the solution in pg 9.3+ is this, preserving rows in the result where the function returns 0 rows:

SELECT p.id, p.name, f.action, f.amount
FROM person p
LEFT JOIN LATERAL calculate_payments(p.id) f ON true;

If you don't care about this, you can simplify in pg 9.3+:

SELECT p.id, p.name, f.action, f.amount
FROM person p, calculate_payments(p.id) f;

Closely related:

  • Record returned from function has columns concatenated

Trying to join on set returning function

It looks like what you want to join against are the individual objects in the array, not the whole row. So use

SELECT m.*, obj
FROM jsontable t, jsonb_array_elements(t.response -> 'SCS') obj
JOIN message m ON m.msg_id = (obj->>'referenceId')::int AND m.status = 1;

or (a bit more readable imo)

SELECT m.*, obj
FROM jsontable t,
LATERAL jsonb_array_elements(t.response -> 'SCS') obj
JOIN message m ON m.msg_id = (obj->>'referenceId')::int
WHERE m.status = 1;

(updated fiddle)

Join result of set-returning function (json_array_elements) with table column

Use unnest() to get unpacked combinations:

select id, unnest(combination) cid
from user_combination;

id | cid
----+-----
6 | 1
6 | 2
9 | 2
9 | 3
(4 rows)

Use unpacked cids to join with colors:

select u.id, color
from (
select id, unnest(combination) cid
from user_combination
) u
join colors c
on cid = c.id;

id | color
----+--------
6 | Blue
6 | Yellow
9 | Yellow
9 | Green
(4 rows)

Use an aggregate function (e.g. json_agg()) to get joined colors aggregated for a user:

select u.id, json_agg(color)
from (
select id, unnest(combination) cid
from user_combination
) u
join colors c
on cid = c.id
group by 1;

id | json_agg
----+---------------------
9 | ["Yellow", "Green"]
6 | ["Blue", "Yellow"]
(2 rows)

If combination is of type json you should use json_array_elements() in a lateral join:

select u.id, json_agg(color)
from (
select id, cid
from user_combination,
lateral json_array_elements(combination) cid
) u
join colors c
on cid::text::int = c.id
group by 1;

What is the expected behaviour for multiple set-returning functions in SELECT clause?

Postgres 10 or newer

pads with null values for smaller set(s). Demo with generate_series():

SELECT generate_series( 1,  2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;

row2 | row3 | row4
-----+------+-----
1 | 11 | 21
2 | 12 | 22
null | 13 | 23
null | null | 24

dbfiddle here

The manual for Postgres 10:

If there is more than one set-returning function in the query's select
list, the behavior is similar to what you get from putting the
functions into a single LATERAL ROWS FROM( ... ) FROM-clause item. For
each row from the underlying query, there is an output row using the
first result from each function, then an output row using the second
result, and so on. If some of the set-returning functions produce
fewer outputs than others, null values are substituted for the missing
data, so that the total number of rows emitted for one underlying row
is the same as for the set-returning function that produced the most
outputs. Thus the set-returning functions run “in lockstep” until they
are all exhausted, and then execution continues with the next
underlying row.

This ends the traditionally odd behavior.

Some other details changed with this rewrite. The release notes:

  • Change the implementation of set-returning functions appearing in a query's SELECT list (Andres Freund)

    Set-returning functions are now evaluated before evaluation of scalar
    expressions in the SELECT list, much as though they had been placed
    in a LATERAL FROM-clause item. This allows saner semantics for cases
    where multiple set-returning functions are present. If they return
    different numbers of rows, the shorter results are extended to match
    the longest result by adding nulls. Previously the results were cycled
    until they all terminated at the same time, producing a number of rows
    equal to the least common multiple of the functions' periods. In
    addition, set-returning functions are now disallowed within CASE and
    COALESCE constructs.
    For more information see Section 37.4.8.

Bold emphasis mine.

Postgres 9.6 or older

The number of result rows (somewhat surprisingly!) is the lowest common multiple of all sets in the same SELECT list. (Only acts like a CROSS JOIN if there is no common divisor to all set-sizes!) Demo:

SELECT generate_series( 1,  2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;

row2 | row3 | row4
-----+------+-----
1 | 11 | 21
2 | 12 | 22
1 | 13 | 23
2 | 11 | 24
1 | 12 | 21
2 | 13 | 22
1 | 11 | 23
2 | 12 | 24
1 | 13 | 21
2 | 11 | 22
1 | 12 | 23
2 | 13 | 24

dbfiddle here

Documented in manual for Postgres 9.6 the chapter SQL Functions Returning Sets, along with the recommendation to avoid it:

Note: The key problem with using set-returning functions in the select
list, rather than the FROM clause, is that putting more than one
set-returning function in the same select list does not behave very
sensibly. (What you actually get if you do so is a number of output
rows equal to the least common multiple of the numbers of rows
produced by each set-returning function.
) The LATERAL syntax produces
less surprising results when calling multiple set-returning functions,
and should usually be used instead.

Bold emphasis mine.

A single set-returning function is OK (but still cleaner in the FROM list), but multiple in the same SELECT list is discouraged now. This was a useful feature before we had LATERAL joins. Now it's merely historical ballast.

Related:

  • Parallel unnest() and sort order in PostgreSQL
  • Unnest multiple arrays in parallel
  • What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?

Joining with set-returning function (SRF) and access columns in SQLAlchemy

It turns out this is not directly supported by SA, but the correct behaviour can be achieved with a ColumnClause and a FunctionElement. First import this recipe as described by zzzeek in this SA issue. Then create a special unnest function that includes the WITH ORDINALITY modifier:

class unnest_func(ColumnFunction):
name = 'unnest'
column_names = ['unnest', 'ordinality']

@compiles(unnest_func)
def _compile_unnest_func(element, compiler, **kw):
return compiler.visit_function(element, **kw) + " WITH ORDINALITY"

You can then use it in joins, ordering, etc. like this:

act_ref = unnest_func(Activity.ob_refs)
query = (query
.add_columns(act_ref.c.unnest, act_ref.c.ordinality)
.outerjoin(act_ref, sa.true())
.outerjoin(Subscription, Subscription.ob_ref == act_ref.c.unnest)
.order_by(act_ref.c.ordinality.desc()))

How do I fix a Postgres 12 Error: set-returning functions are not allowed in CASE

For some reason I wasn't able to use the LATERAL since it just generated other syntactic error messages (I'll have to work on that on the long run). So I was able to solve my issue by simply just selecting all values then moving the CASE higher up in the query so that the generate_series() is not inside a case statement:

SELECT other_columns, 
CASE
WHEN num_payments > 1 THEN date_2 ELSE date_1
END AS start_date
FROM(
SELECT other_columns,
start_date AS date_1,
generate_series(start_date, start_date + ((payment_interval*(num_payments-1)) || payment_interval2)::interval, (payment_interval::text || payment_interval2)::interval)::date AS date_2
FROM(
-- INNER QUERY
)a
)b

PostgreSQL function to return a join from 2 select statements

  1. You should use the plpgsql language, not sql
  2. Write RETURN QUERY before every SELECT
  3. Don't forget putting a semicolon at the end of each statement
  4. Read the manual :)

Call a set-returning function with an array argument multiple times

In Postgres 9.3 or later, it's typically best to use LEFT JOIN LATERAL ... ON true:

SELECT sub.dataid, f.*
FROM (
SELECT dataid, array_agg(data) AS arr
FROM dataset
WHERE dataid = something
GROUP BY 1
) sub
LEFT JOIN LATERAL foo(sub.arr) f ON true;

If the function foo() can return no rows, that's the safe form as it preserves all rows to the left of the join, even when no row is returned to the right.

Else, or if you want to exclude rows without result from the lateral join, use:

CROSS JOIN LATERAL foo(sub.arr)

or the shorthand:

, foo(sub.arr)

There is an explicit mention in the manual.

Craig's related answer (referenced by Daniel) is updated accordingly:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

PostgreSQL ERROR: set-returning functions must appear at top level of FROM

The idea of your way is not quite clear to me. It seems very complicated.

But: The error you get: Because jsonb_array_elements() does not return just one single but many (a set of records, so, it is a "set-returning function"). A set of records cannot be used as an argument for another function directly. This is meant by "at the top level". Such a function can only appear directly as FROM list element.


Beside this: Here is the way I would chose to achieve your result:

demo:db<>fiddle

Getting only the up sums:

SELECT 
sensor,
SUM((elems ->> 'result')::numeric) AS up_sum -- 3
FROM
mytable,
jsonb_array_elements(details) elems -- 1
WHERE elems ->> 'direction' = 'up' -- 2
GROUP BY sensor
  1. Expand the array elements into one row each
  2. Filter these elements by the direction value
  3. SUM the result values

If you want to get the sums of both direction, you could use the conditional aggregations using the FILTER clause:

SELECT 
sensor,
SUM((elems ->> 'result')::numeric)
FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
SUM((elems ->> 'result')::numeric)
FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum
FROM
mytable,
jsonb_array_elements(details) elems
GROUP BY sensor

Filtering set returning function results

In principle, the optimizer has no clue what a function does – the function body is a string that is handled by the call handler of the function's procedural language.

The one exception are functions written in LANGUAGE sql. If they are simple enough, and inlining them can be proven not to change the semantics of the SQL statement, the query rewriter will inline them.

See the following comment in backend/optimizer/prep/prepjointree.c:

/*
* inline_set_returning_functions
* Attempt to "inline" set-returning functions in the FROM clause.
*
* If an RTE_FUNCTION rtable entry invokes a set-returning function that
* contains just a simple SELECT, we can convert the rtable entry to an
* RTE_SUBQUERY entry exposing the SELECT directly. This is especially
* useful if the subquery can then be "pulled up" for further optimization,
* but we do it even if not, to reduce executor overhead.
*
* This has to be done before we have started to do any optimization of
* subqueries, else any such steps wouldn't get applied to subqueries
* obtained via inlining. However, we do it after pull_up_sublinks
* so that we can inline any functions used in SubLink subselects.
*
* Like most of the planner, this feels free to scribble on its input data
* structure.
*/

There are also two instructive comments in inline_set_returning_function in backend/optimizer/util/clauses.c:

/*
* Forget it if the function is not SQL-language or has other showstopper
* properties. In particular it mustn't be declared STRICT, since we
* couldn't enforce that. It also mustn't be VOLATILE, because that is
* supposed to cause it to be executed with its own snapshot, rather than
* sharing the snapshot of the calling query. (Rechecking proretset is
* just paranoia.)
*/

and

/*
* Make sure the function (still) returns what it's declared to. This
* will raise an error if wrong, but that's okay since the function would
* fail at runtime anyway. Note that check_sql_fn_retval will also insert
* RelabelType(s) and/or NULL columns if needed to make the tlist
* expression(s) match the declared type of the function.
*
* If the function returns a composite type, don't inline unless the check
* shows it's returning a whole tuple result; otherwise what it's
* returning is a single composite column which is not what we need. (Like
* check_sql_fn_retval, we deliberately exclude domains over composite
* here.)
*/

Use EXPLAIN to see if your function is inlined.

An example where it works:

CREATE TABLE a (
"date" date NOT NULL,
other_field text NOT NULL
);

CREATE OR REPLACE FUNCTION a_at_date(date)
RETURNS TABLE ("date" date, other_field text)
LANGUAGE sql STABLE CALLED ON NULL INPUT
AS 'SELECT "date", other_field FROM a WHERE "date" = $1';

EXPLAIN (VERBOSE, COSTS off)
SELECT *
FROM a_at_date(current_date)
WHERE other_field = 'value';

QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on laurenz.a
Output: a.date, a.other_field
Filter: ((a.other_field = 'value'::text) AND (a.date = CURRENT_DATE))
(3 rows)


Related Topics



Leave a reply



Submit