How to Avoid Multiple Function Evals With the (Func()).* Syntax in a Query

How to avoid multiple function evals with the (func()).* syntax in a query?

You can wrap it up in a subquery but that's not guaranteed safe without the OFFSET 0 hack. In 9.3, use LATERAL. The problem is caused by the parser effectively macro-expanding * into a column list.

Workaround

Where:

SELECT (my_func(x)).* FROM some_table;

will evaluate my_func n times for n result columns from the function, this formulation:

SELECT (mf).* FROM (
SELECT my_func(x) AS mf FROM some_table
) sub;

generally will not, and tends not to add an additional scan at runtime. To guarantee that multiple evaluation won't be performed you can use the OFFSET 0 hack or abuse PostgreSQL's failure to optimise across CTE boundaries:

SELECT (mf).* FROM (
SELECT my_func(x) AS mf FROM some_table OFFSET 0
) sub;

or:

WITH tmp(mf) AS (
SELECT my_func(x) FROM some_table
)
SELECT (mf).* FROM tmp;

In PostgreSQL 9.3 you can use LATERAL to get a saner behaviour:

SELECT mf.*
FROM some_table
LEFT JOIN LATERAL my_func(some_table.x) AS mf ON true;

LEFT JOIN LATERAL ... ON true retains all rows like the original query, even if the function call returns no row.

Demo

Create a function that isn't inlineable as a demonstration:

CREATE OR REPLACE FUNCTION my_func(integer)
RETURNS TABLE(a integer, b integer, c integer) AS $$
BEGIN
RAISE NOTICE 'my_func(%)',$1;
RETURN QUERY SELECT $1, $1, $1;
END;
$$ LANGUAGE plpgsql;

and a table of dummy data:

CREATE TABLE some_table AS SELECT x FROM generate_series(1,10) x;

then try the above versions. You'll see that the first raises three notices per invocation; the latter only raise one.

Why?

Good question. It's horrible.

It looks like:

(func(x)).*

is expanded as:

(my_func(x)).i, (func(x)).j, (func(x)).k, (func(x)).l

in parsing, according to a look at debug_print_parse, debug_print_rewritten and debug_print_plan. The (trimmed) parse tree looks like this:

   :targetList (
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 1
:resulttype 23
:resulttypmod -1
:resultcollid 0
}
:resno 1
:resname i
...
}
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 2
:resulttype 20
:resulttypmod -1
:resultcollid 0
}
:resno 2
:resname j
...
}
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 3
:...
}
:resno 3
:resname k
...
}
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 4
...
}
:resno 4
:resname l
...
}
)

So basically, we're using a dumb parser hack to expand wildcards by cloning nodes.

select function() in postgresql makes too much calls to function()

This is a known issue.

SELECT (f(x)).*

is macro-expanded at parse-time into

SELECT (f(x)).a, (f(x)).b, ...

and PostgreSQL doesn't coalesce multiple calls to the same function down to a single call.

To avoid the issue you can wrap it in another layer of subquery so that the macro-expansion occurs on a simple reference to the function's result rather than the function invocation:

select i, (f).* 
FROM (
SELECT i, foo(i) f from generate_series(1,2) as i
) x(i, f)

or use a lateral call in the FROM clause, which is preferred for newer versions:

select i, f.*
from generate_series(1,2) as i
CROSS JOIN LATERAL foo(i) f;

The CROSS JOIN LATERAL may be omitted, using legacy comma joins and an implicit lateral join, but I find it considerably clear to include it, especially when you're mixing other join types.

Call a set-returning function with an array argument multiple times

In Postgres 9.3 or later, it's typically best to use LEFT JOIN LATERAL ... ON true:

SELECT sub.dataid, f.*
FROM (
SELECT dataid, array_agg(data) AS arr
FROM dataset
WHERE dataid = something
GROUP BY 1
) sub
LEFT JOIN LATERAL foo(sub.arr) f ON true;

If the function foo() can return no rows, that's the safe form as it preserves all rows to the left of the join, even when no row is returned to the right.

Else, or if you want to exclude rows without result from the lateral join, use:

CROSS JOIN LATERAL foo(sub.arr)

or the shorthand:

, foo(sub.arr)

There is an explicit mention in the manual.

Craig's related answer (referenced by Daniel) is updated accordingly:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

Record returned from function has columns concatenated

Generally, to decompose rows returned from a function and get individual columns:

SELECT * FROM account_servicetier_for_day(20424, '2014-08-12');



As for the query:

Postgres 9.3 or newer

Cleaner with JOIN LATERAL:

SELECT '2014-08-12' AS day, 0 AS inbytes, 0 AS outbytes
, a.username, a.accountid, a.userid
, f.* -- but avoid duplicate column names!
FROM account_tab a
, account_servicetier_for_day(a.accountid, '2014-08-12') f -- <-- HERE
WHERE a.isdsl = 1
AND a.dslservicetypeid IS NOT NULL
AND NOT EXISTS (
SELECT FROM dailyaccounting_tab
WHERE day = '2014-08-12'
AND accountid = a.accountid
)
ORDER BY a.username;

The LATERAL keyword is implicit here, functions can always refer earlier FROM items. The manual:

LATERAL can also precede a function-call FROM item, but in this
case it is a noise word, because the function expression can refer to
earlier FROM items in any case.

Related:

  • Insert multiple rows in one table based on number in another table

Short notation with a comma in the FROM list is (mostly) equivalent to a CROSS JOIN LATERAL (same as [INNER] JOIN LATERAL ... ON TRUE) and thus removes rows from the result where the function call returns no row. To retain such rows, use LEFT JOIN LATERAL ... ON TRUE:

...
FROM account_tab a
LEFT JOIN LATERAL account_servicetier_for_day(a.accountid, '2014-08-12') f ON TRUE
...

Also, don't use NOT IN (subquery) when you can avoid it. It's the slowest and most tricky of several ways to do that:

  • Select rows which are not present in other table

I suggest NOT EXISTS instead.

Postgres 9.2 or older

You can call a set-returning function in the SELECT list (which is a Postgres extension of standard SQL). For performance reasons, this is best done in a subquery. Decompose the (well-known!) row type in the outer query to avoid repeated evaluation of the function:

SELECT '2014-08-12' AS day, 0 AS inbytes, 0 AS outbytes
, a.username, a.accountid, a.userid
, (a.rec).* -- but be wary of duplicate column names!
FROM (
SELECT *, account_servicetier_for_day(a.accountid, '2014-08-12') AS rec
FROM account_tab a
WHERE a.isdsl = 1
AND a.dslservicetypeid Is Not Null
AND NOT EXISTS (
SELECT FROM dailyaccounting_tab
WHERE day = '2014-08-12'
AND accountid = a.accountid
)
) a
ORDER BY a.username;

Related answer by Craig Ringer with an explanation, why it's better not to decompose on the same query level:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

Postgres 10 removed some oddities in the behavior of set-returning functions in the SELECT:

  • What is the expected behaviour for multiple set-returning functions in SELECT clause?

Split function-returned record into multiple columns

Postgres 9.3 or later

Best solved with a LATERAL join:

SELECT *
FROM actors a
JOIN movies_actors ma on a.actor_id = ma.movie_id
LEFT JOIN LATERAL hi_lo(a.actor_id, length(a.name), ma.movie_id) x ON true
LIMIT 10;

Avoids repeated evaluation of the function (for each column in the output - the function does have to be called for each input row either way). See:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

LEFT JOIN LATERAL ... ON true to avoid dropping rows from the left side if the function to the right returns no row. See:

  • What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

Addressing your comment:

only the expanded columns produced by the function call

SELECT x.*  -- that's all!
FROM actors a
JOIN movies_actors ma on a.actor_id = ma.movie_id
LEFT JOIN LATERAL hi_lo(a.actor_id, length(a.name), ma.movie_id) x ON true
LIMIT 10;

But since you don't care about other columns, you can simplify to:

SELECT x.*
FROM actors a
JOIN movies_actors ma on a.actor_id = ma.movie_id
, hi_lo(a.actor_id, length(a.name), ma.movie_id) x
LIMIT 10;

Which is an implicit CROSS JOIN LATERAL. If the function can actually return "no row" occasionally, the result can be different: we don't get NULL values for the rows, those rows are just eliminated - and LIMIT does not count them any more.

Older versions (or generally)

You can also just decompose the composite type with the right syntax:

SELECT *, (hi_lo(a.actor_id, length(a.name), ma.movie_id)).*  -- note extra parentheses!
FROM actors a
JOIN movies_actors ma on a.actor_id = ma.movie_id
LIMIT 10;

The drawback is that the function is evaluated once for each column in the function output due to the weakness in the Postgres query planner mentioned at the top. Better move the call into a subquery or CTE and decompose the row type in the outer SELECT. Like:

SELECT actor_id, movie_id, (x).*  -- explicit column names for the rest
FROM (
SELECT *, hi_lo(a.actor_id, length(a.name), ma.movie_id) AS x
FROM actors a
JOIN movies_actors ma on a.actor_id = ma.movie_id
LIMIT 10
) sub;

But you have to name individual columns and can't get away with SELECT * unless you are ok with the row type in the result.
Related:

  • Avoid multiple calls on same function when expanding composite result

Avoid multiple calls on same function when expanding composite result

A CTE is not even necessary. A plain subquery does the job as well (tested with pg 9.3):

SELECT i, (f).*                     -- decompose here
FROM (
SELECT i, (slow_func(i)) AS f -- do not decompose here
FROM generate_series(1, 3) i
) sub;

Be sure not to decompose the composite result of the function in the subquery. Reserve that for the outer query.

Requires a well known type, of course. Would not work with anonymous records.

Or, what @Richard wrote, a LATERAL JOIN works, too. The syntax can be simpler:

SELECT * FROM generate_series(1, 3) i, slow_func(i) f
  • LATERAL is applied implicitly in Postgres 9.3 or later.
  • A function can stand on its own in the FROM clause, doesn't have to be wrapped in an additional sub-select. Just imagine a table in its place.

SQL Fiddle with EXPLAIN VERBOSE output for all variants. You can see multiple evaluation of the function if it happens.

COST setting

Generally (should not matter for this particular query), make sure to apply a high cost setting to your function, so the planner knows to avoid evaluating more often then necessary. Like:

CREATE OR REPLACE FUNCTION slow_function(int)
RETURNS result_t AS
$func$
-- expensive body
$func$ LANGUAGE sql IMMUTABLE COST 100000;

Per documentation:

Larger values cause the planner to try to avoid evaluating the function more often than necessary.

JOIN on set returning function results

In Postgres 9.1:

SELECT name, (f).*  -- note the parentheses!
FROM (SELECT name, calculate_payments(id) AS f FROM person) sub;

Assuming that your function has a well-defined return type with column names (id, action, amount). And that your function always returns the same id it is fed (which is redundant and might be optimized).

The same in much more verbose form:

SELECT sub.id, sub.name, (sub.f).action, (sub.f).amount  -- parentheses!
FROM (
SELECT p.id, p.name, calculate_payments(p.id) AS f(id, action, amount)
FROM person p
) sub;

Set-returning functions in the SELECT list result in multiple rows. But that's a non-standard and somewhat quirky feature. The new LATERAL feature in pg 9.3+ is preferable.

You could decompose the row type in the same step:

SELECT *, (calculate_payments(p.id)).*  -- parentheses!
FROM person p

But due to a weakness in the Postgres query planner, this would evaluate the function once per result column:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

Or in your case:

SELECT p.id, p.name
, (calculate_payments(p.id)).action
, (calculate_payments(p.id)).amount
FROM person p

Same problem: repeated evaluation.

To be precise, the equivalent of the solution in pg 9.3+ is this, preserving rows in the result where the function returns 0 rows:

SELECT p.id, p.name, f.action, f.amount
FROM person p
LEFT JOIN LATERAL calculate_payments(p.id) f ON true;

If you don't care about this, you can simplify in pg 9.3+:

SELECT p.id, p.name, f.action, f.amount
FROM person p, calculate_payments(p.id) f;

Closely related:

  • Record returned from function has columns concatenated

Pass array variable as parameter to another function

You could fix your function like this:

CREATE OR REPLACE FUNCTION function_1(in_account int
, OUT out_code1 int
, OUT out_message1 text)
RETURNS RECORD AS
$func$
DECLARE
counter int := 1; -- use int and initialize
inv_cur record;
invoice_list text[];
amount_list numeric[];
BEGIN

FOR inv_cur IN
SELECT "ID", "AMOUNT"
FROM "INVOICE"
WHERE "ACCOUNT_ID" = in_account -- !!!
ORDER BY "ID" -- don't you care about sort order?
LOOP
--Adding invoices to invoice array
invoice_list[counter] := inv_cur."ID";

--Adding amounts to amount array
amount_list[counter] := inv_cur."AMOUNT";

--Increasing counter for array indexes
counter := counter + 1;
END LOOP;

-- Calling other function
SELECT f.outfield_1, f.outfield_2 -- replace with actual names!
INTO out_code1, out_message1
FROM function_2(invoice_list, amount_list) f
;

END
$func$ LANGUAGE plpgsql VOLATILE

Notes

  • The function variable counter must be initialized or it's set to NULL.

  • in_account is interpreted as column name of table "INVOICE" (which it probably isn't). Seems a function parameter is missing.

  • Replace my added ORDER BY "ID" with your actual desired sort order.

  • You need to assign the result of your final SELECT.

  • invoice_list and amount_list are arrays already. Only wrap them into another ARRAY layer if you want to add another array dimension (I doubt that.)

Now the function should work. It's still expensive nonsense ...

Array handling the way you do it is very expensive. Looping is expensive, too. Replace function_1() with this query:

SELECT f.*
FROM (
SELECT function_2(array_agg("ID"), array_agg("AMOUNT")) AS f
FROM (
SELECT "ID", "AMOUNT"
FROM "INVOICE"
WHERE "ACCOUNT_ID" = in_account -- your input here!
ORDER BY "ID"
) t1
) t2;

You could make do with a single query level:

SELECT (function_2(array_agg("ID" ORDER BY "ID")
, array_agg("AMOUNT" ORDER BY "ID"))).*
FROM "INVOICE"
WHERE "ACCOUNT_ID" = in_account;

But performance would be much worse. The version with subqueries has to sort only once and also calls the function only once:

  • How to avoid multiple function evals with the (func()).* syntax in an SQL query?

You can wrap that into an SQL function if need be.



Related Topics



Leave a reply



Submit