What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
What is a LATERAL
join?
The feature was introduced with PostgreSQL 9.3. The manual:
Subqueries appearing in
FROM
can be preceded by the key wordLATERAL
. This allows them to reference columns provided by precedingFROM
items. (WithoutLATERAL
, each subquery is evaluated
independently and so cannot cross-reference any otherFROM
item.)Table functions appearing in
FROM
can also be preceded by the key
wordLATERAL
, but for functions the key word is optional; the
function's arguments can contain references to columns provided by
precedingFROM
items in any case.
Basic code examples are given there.
More like a correlated subquery
A LATERAL
join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL
join are evaluated once for each row left of it - just like a correlated subquery - while a plain subquery (table expression) is evaluated once only. (The query planner has ways to optimize performance for either, though.)
Related answer with code examples for both side by side, solving the same problem:
- Optimize GROUP BY query to retrieve latest row per user
For returning more than one column, a LATERAL
join is typically simpler, cleaner and faster.
Also, remember that the equivalent of a correlated subquery is LEFT JOIN LATERAL ... ON true
:
- Call a set-returning function with an array argument multiple times
Things a subquery can't do
There are things that a LATERAL
join can do, but a (correlated) subquery cannot (easily). A correlated subquery can only return a single value, not multiple columns and not multiple rows - with the exception of bare function calls (which multiply result rows if they return multiple rows). But even certain set‑returning functions are only allowed in the FROM
clause. Like unnest()
with multiple parameters in Postgres 9.4 or later. The manual:
This is only allowed in the
FROM
clause;
So this works, but cannot (easily) be replaced with a subquery:
CREATE TABLE tbl (a1 int[], a2 int[]);
SELECT * FROM tbl, unnest(a1, a2) u(elem1, elem2); -- implicit LATERAL
The comma (,
) in the FROM
clause is short notation for CROSS JOIN
.LATERAL
is assumed automatically for table functions.
About the special case of UNNEST( array_expression [, ... ] )
:
- How do you declare a set-returning-function to only be allowed in the FROM clause?
Set-returning functions in the SELECT
list
You can also use set-returning functions like unnest()
in the SELECT
list directly. This used to exhibit surprising behavior with more than one such function in the same SELECT
list up to Postgres 9.6. But it has finally been sanitized with Postgres 10 and is a valid alternative now (even if not standard SQL). See:
- What is the expected behaviour for multiple set-returning functions in SELECT clause?
Building on above example:
SELECT *, unnest(a1) AS elem1, unnest(a2) AS elem2
FROM tbl;
Comparison:
dbfiddle for pg 9.6 here
dbfiddle for pg 10 here
Clarify misinformation
The manual:
For the
INNER
andOUTER
join types, a join condition must be
specified, namely exactly one ofNATURAL
,ON
join_condition,
orUSING
(join_column [, ...]). See below for the meaning.
ForCROSS JOIN
, none of these clauses can appear.
So these two queries are valid (even if not particularly useful):
SELECT *
FROM tbl t
LEFT JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t ON TRUE;
SELECT *
FROM tbl t, LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;
While this one is not:
SELECT *
FROM tbl t
LEFT JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;
That's why Andomar's code example is correct (the CROSS JOIN
does not require a join condition) and Attila's is was not.
Optimize slow query when multiple JOIN LATERAL are present
The main issue is that the query is wrong:
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 32
The OR
is misplaced here and must be introduced after aggregating the right rows. This WHERE
clause includes all rows with either a matching quiz_id
or tag_id = 32
. So all rows, which is nonsense.
In addition to that, you can conflate multiple LATERAL
subqueries with conditional aggregates like this:
SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
SELECT sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
, sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
, sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
-- , more?
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE qa.quiz_id = q.quiz_id
AND ts.tag_id IN (21, 32, 35) -- more?
) AS t ON t.sum_score21 <= 1
OR t.sum_score32 <= 1
OR t.sum_score35 <= 1
-- AND / OR more?
WHERE q.state = 'subscribed'
AND q.app_id = 4;
Be aware of operator precedence when adding more conditions with AND
or OR
: You may need parentheses as AND
binds before OR
.
About aggregated FILTER
:
- Aggregate columns with additional (distinct) filters
A multicolumn index on subscriptions(app_id, state, quizz_id)
might help (give you index-only scans). But since the table isn't that big, that's not important.
LATERAL
(rather than a plain subquery) still makes sense while the outer filter eliminates most rows from table subscriptions
. A multicolumn index on tag_scores(answer_id, tag_id)
might help.
With more tags in the subquery and/or more subscriptions, the LATERAL
variant and the usefulness of said index fall off.
For comparison, here is a variant with plain subquery:
SELECT count(*)
FROM subscriptions q
JOIN (
SELECT qa.quiz_id
, sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
, sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
, sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
-- , more?
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE ts.tag_id IN (21, 32, 35) -- more?
GROUP BY qa.quiz_id
) AS t USING (quiz_id)
WHERE q.state = 'subscribed'
AND q.app_id = 4
AND (t.sum_score21 <= 1
OR t.sum_score32 <= 1
OR t.sum_score35 <= 1)
-- AND / OR more?
;
Either way, t.sum_score21 <= 1
qualifies if ...
- at least one row with tag 21 is associated
- and the summed score of all with tag 21 for the same quiz is <= 1
Seems like a pretty narrow filter.
Denoise:
If the row in answers
is never missing (referential integrity enforced with FK constraints?), you can cut out the middleman here:
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
-->
FROM quiz_answers qa
JOIN tag_scores ts USING (answer_id)
Postgresql - LEFT JOIN LATERAL is too slow than subquery
Why not just use a join
and group by
?
SELECT AA.ID, COUNT(B.ID) as no_tx, min(B.DATE) as fday_tx, max(B.DATE) as lday_tx,
AA.start_date, AA.end_date
FROM (SELECT ID, min(DATE) as start_date, max(DATE) as end_date
FROM MAIN_TABLE
WHERE CODE = 'drugA'
GROUP BY ID
) AA LEFT JOIN
MAIN_TABLE b
ON b.CODE = 'drugB' AND b.DATE > AA.start_date AND b.DATE < AA.end_date
GROUP BY AA.ID, AA.start_date, AA.end_date;
Or, perhaps more efficiently, window functions:
SELECT ID, SUM(CASE WHEN code = 'drugB' THEN 1 ELSE 0 END) as no_tx,
MIN(CASE WHEN code = 'drugB' THEN DATE END) as fday_tx,
MIN(CASE WHEN code = 'drugB' THEN DATE END) as lday_tx,
start_date, end_date
FROM (SELECT t.*,
MIN(CASE WHEN code = 'drugA' THEN date END) as start_date,
MAX(CASE WHEN code = 'drugB' THEN date END) as end_date
FROM MAIN_TABLE t
) t
WHERE code in ('drugA', 'drugB') AND
date between start_date and end_date
GROUP BY t.id;
How to optimize sql query with subqueries, perhaps by lateral join?
It is hard to reproduce the logic of your query without any knowledges about your DB but I will try, so be patient:
SELECT r0."id", r0."name"
FROM "hiking"."routes" AS r0
INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent"
WHERE
EXISTS (
SELECT 1
FROM "hiking"."segments" AS s0
WHERE (
ST_Intersects(
s0."geom",
ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)),
3857)))
AND array[h1."child"] <@ s0."rels");
There are two points:
- Filtering data by
EXISTS
orNOT EXISTS
sometimes faster then by joining - Instead of unnesting array field to compare its elements with some value you could to use the array comparison operator(s). Having appropriate GIN index it is much faster (docs here and here).
Here is simple example how to use indexes on arrays and how its faster:
create table foo(bar int[]);
insert into foo(bar) select array[1,2,3,x] from generate_series(1,1000000) as x;
create index idx on foo using gin (bar); // Note this
select * from foo where 666 in (select unnest(bar)); // 6936,345 ms on my HW
select * from foo where array[666] <@ bar; // 45,524 ms
Related Topics
How Can Prepared Statements Protect from SQL Injection Attacks
Search All Fields in All Tables For a Specific Value (Oracle)
Need to Return Two Sets of Data With Two Different Where Clauses
Null in MySQL (Performance & Storage)
Delete Duplicate Rows from Small Table
Create a Date from Day Month and Year With T-Sql
T-Sql: Deleting All Duplicate Rows But Keeping One
How to Implement Limit With SQL Server
How to Perform an If...Then in an SQL Select
Recommended SQL Database Design For Tags or Tagging
Why Does Null = Null Evaluate to False in SQL Server
SQL Server 2012 Column Identity Increment Jumping from 6 to 1000+ on 7Th Entry