Postgresql Reusing Computation Result in Select Query

PostgreSQL reusing computation result in select query

This could be an alternative you might use:

SELECT foo.c
FROM (
SELECT (a+b) as c FROM table
) as foo
WHERE foo.c < 5
AND (foo.c*foo.c+t) > 100

From a performance point of view, I think it's not an optimal solution (because of the lack of WHERE clause of foo subquery, hence returning all table records). I don't know if Postgresql does some query optimization there.

PostgreSQL reusing value from long calculation in CASE statement

First of all I guess that query optimizer is smart enough to spot the same deterministic expressions and do not calculate it twice.

If this is not applicable you could use LATERAL:

SELECT *,
CASE column1
WHEN sub.long_calc THEN 10
ELSE sub.long_calc + 2 * 3.14
END AS mycalc
FROM tab t
,LATERAL (VALUES(t.a+t.b+t.c)) AS sub(long_calc);

SqlFiddleDemo

Output:

╔═════╦══════════╦════╦════╦════╦════════════╦════════╗
║ id ║ column1 ║ a ║ b ║ c ║ long_calc ║ mycalc ║
╠═════╬══════════╬════╬════╬════╬════════════╬════════╣
║ 1 ║ 6 ║ 1 ║ 2 ║ 3 ║ 6 ║ 10 ║
║ 2 ║ 20 ║ 2 ║ 3 ║ 4 ║ 9 ║ 15.28 ║
╚═════╩══════════╩════╩════╩════╩════════════╩════════╝

You could replace VALUES with simple SELECT or function call:

-- any query
,LATERAL (SELECT t.a+t.b+t.c) AS sub(long_calc)
-- function
,LATERAL random() AS sub(long_calc)
-- function with parameter passing
,LATERAL sin(t.a) AS sub(long_calc)

SqlFiddleDemo2


EDIT:

SELECT id
,sub2.long_calc_rand -- calculated once
,random() AS rand -- calculated every time
FROM tab t
,LATERAL random() AS sub2(long_calc_rand);

SqlFiddleDemo3

Output:

╔═════╦═════════════════════╦════════════════════╗
║ id ║ long_calc_rand ║ rand ║
╠═════╬═════════════════════╬════════════════════╣
║ 1 ║ 0.3426254219375551 ║ 0.8861959744244814 ║
║ 2 ║ 0.3426254219375551 ║ 0.8792812027968466 ║
║ 3 ║ 0.3426254219375551 ║ 0.8123061805963516 ║
╚═════╩═════════════════════╩════════════════════╝

Reuse computed select value

Test timing

You don't see the evaluation of individual functions per row in the EXPLAIN output.

Test with EXPLAIN ANALYZE to get actual query times to compare overall effectiveness. Run a couple of times to rule out caching artifacts. For simple queries like this, you get more reliable numbers for the total runtime with:

EXPLAIN (ANALYZE, TIMING OFF) SELECT ...

Requires Postgres 9.2+. Per documentation:

TIMING

Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow
down the query significantly on some systems, so it may be useful to
set this parameter to FALSE when only actual row counts, and not exact
times, are needed. Run time of the entire statement is always
measured, even when node-level timing is turned off with this option.
This parameter may only be used when ANALYZE is also enabled. It
defaults to TRUE.

Prevent repeated evaluation

Normally, expressions in a subquery are evaluated once. But Postgres can collapse trivial subqueries if it thinks that will be faster.

To introduce an optimization barrier, you could use a CTE instead of the subquery. This guarantees that Postgres computes ST_SnapToGrid(geom, 50) once only:

WITH cte AS (
SELECT ST_SnapToGrid(geom, 50) AS geom1
FROM points
)
SELECT COUNT(*) AS n
, ST_X(geom1) AS x
, ST_Y(geom1) AS y
FROM cte
GROUP BY geom1; -- see below

However, this it's probably slower than a subquery due to more overhead for a CTE. The function call is probably very cheap. Generally, Postgres knows better how to optimize a query plan. Only introduce such an optimization barrier if you know better.

Simplify

I changed the name of the computed point in the subquery / CTE to geom1 to clarify it's different from the original geom. That helps to clarify the more important thing here:

GROUP BY geom1

instead of:

GROUP BY x, y

That's obviously cheaper - and may have an influence on whether the function call is repeated. So, this is probably fastest:

SELECT COUNT(*) AS n
, ST_X(ST_SnapToGrid(geom, 50)) AS x
, ST_y(ST_SnapToGrid(geom, 50)) AS y
FROM points
GROUP BY ST_SnapToGrid(geom, 50); -- same here!

Or maybe this:

SELECT COUNT(*)    AS n
, ST_X(geom1) AS x
, ST_y(geom1) AS y
FROM (
SELECT ST_SnapToGrid(geom, 50) AS geom1
FROM points
) AS tmp
GROUP BY geom1;

Test all three with EXPLAIN ANALYZE or EXPLAIN (ANALYZE, TIMING OFF) and see for yourself. Testing >> guessing.

How to reuse a result column in an expression for another result column

Like so:

SELECT
turnover,
cost,
turnover - cost as profit
from (
(SELECT SUM(...) FROM ...) as turnover,
(SELECT SUM(...) FROM ...) as cost
) as partial_sums

SQL : How to reuse count(*) computed value?

Not in SQL Server, you would have to use one of these:

SELECT date_part('year'::text, c.date) AS yyyy,
to_char(c.date, 'MM'::text) AS monthnumber,
to_char(c.date, 'TMMonth'::text) AS monthname,
l.id AS lineID,
n.id AS networkID,
l.name AS lineName,
count(c.*) AS count,
count(distinct(c.date)) AS number_of_journeys,
count(c.*) / count(distinct(c.date)) AS frequentation_moyenne

OR

Select yyyy, monthnumber, monthname, lineID, networkID, lineName, count, number_of_journery, count / number_of_journeys AS frequentation_moyenne
from
(SELECT date_part('year'::text, c.date) AS yyyy,
to_char(c.date, 'MM'::text) AS monthnumber,
to_char(c.date, 'TMMonth'::text) AS monthname,
l.id AS lineID,
n.id AS networkID,
l.name AS lineName,
count(c.*) AS count,
count(distinct(c.date)) AS number_of_journeys)

Is it possible to reuse scalar result from a single subquery in insert query in Postgres?

You can do use insert . . . select, basically moving the VALUES() into the FROM clause:

INSERT INTO my_table (col1, col2, computed_col)
SELECT v.col1, v.col2, x.some_col || v.computed
FROM (SELECT some_col FROM some_table WHERE id = :id
) x CROSS JOIN
(VALUES (:col1Val1, :col2val1, ARRAY[:computed_col1]::bigint[]),
(:col1Val2, :col2val2, ARRAY[:computed_col2]::bigint[])
) v(col1, col2, computed);

How to re-use result for SELECT, WHERE and ORDER BY clauses?

In the GROUP BY and ORDER BY clause you can refer to column aliases (output columns) or even ordinal numbers of SELECT list items. I quote the manual on ORDER BY:

Each expression can be the name or ordinal number of an output column
(SELECT list item)
, or it can be an arbitrary expression formed from
input-column values.

Bold emphasis mine.

But in the WHERE and HAVING clauses, you can only refer to columns from the base tables (input columns), so you have to spell out your function call.

SELECT *, earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) AS dist
FROM venues
WHERE earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) <= radius
ORDER BY distance;

If you want to know if it's faster to pack the calculation into a CTE or subquery, just test it with EXPLAIN ANALYZE. (I doubt it.)

SELECT *
FROM (
SELECT *
,earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) AS dist
FROM venues
) x
WHERE distance <= radius
ORDER BY distance;

Like @Mike commented, by declaring a function STABLE (or IMMUTABLE) you inform the query planner that results from a function call can be reused multiple times for identical calls within a single statement. I quote the manual here:

A STABLE function cannot modify the database and is guaranteed to
return the same results given the same arguments for all rows within a
single statement. This category allows the optimizer to optimize
multiple calls of the function to a single call
.

Bold emphasis mine.



Related Topics



Leave a reply



Submit