Join Statement Order of Operation

Join statement order of operation

The placement of the ON clauses controls the logical order of evaluation.

So first the t1 LEFT JOIN t2 ON t1.fk = t2.pk happens. The result of this join is a virtual table containing all the matching rows from t1, t2 and (because it is a left outer join) any non matched t1 rows are also preserved with null values for the t2 columns.

This virtual table then participates in the next join. JOIN t3 ON t2.fk = t3.pk

Any t2 records that do not match rows in t1 are not part of the virtual table output from the first stage so won't appear in the final result. Additionally this inner join on t2.fk = t3.pk will lose any NULL values of t2.fk effectively turning your whole thing back into inner joins.

Logical Query Processing is explained well by Itzik Ben Gan here

WHERE and JOIN order of operation

this would depend on many many things (table size, index, key distribution, etc), you should just check the execution plan:

you don't say which database, but here are some ways:

MySql EXPLAIN

SQL Server SET SHOWPLAN_ALL (Transact-SQL)

Oracle EXPLAIN PLAN

what is explain in teradata?

Teradata Capture and compare plans faster with Visual Explain and XML plan logging

Does the join order matter in SQL?

For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.


For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.

First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a

Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id

is equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id

but:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id

is not equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id

Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:

a LEFT JOIN b 
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition

This is equivalent to a LEFT JOIN (b LEFT JOIN c):

a LEFT JOIN  
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition

only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.

You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.

If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.

Which performs first WHERE clause or JOIN clause

The conceptual order of query processing is:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:

CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))

;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))

INSERT INTO test1(name) SELECT 'a' FROM cte

Now run 2 queries:

SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1

SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1

Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:

1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)

2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)

This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.

SQL order of operations

It depends on the database.

On SQL Server, run: SET SHOWPLAN_ALL ON then run the query, you will get an idea of what happens when it runs.

SQL joining three tables, join precedence

All kinds of outer and normal joins are in the same precedence class and operators take effect left-to-right at a given nesting level of the query. You can put the join expression on the right side in parentheses to cause it to take effect first. Remember that you will have to move the ON clauses around so that they stay with their joins—the join in parentheses takes its ON clause with it into the parentheses, so it now comes textually before the other ON clause which will be after the parentheses in the outer join statement.

(PostgreSQL example)

In

SELECT * FROM a LEFT JOIN b ON (a.id = b.id) JOIN c ON (b.ref = c.id);

the a-b join takes effect first, but we can force the b-c join to take effect first by putting it in parentheses, which looks like:

SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

Often you can express the same thing without extra parentheses by moving the joins around and changing the direction of the outer joins, e.g.

SELECT * FROM b JOIN c ON (b.ref = c.id) RIGHT JOIN a ON (a.id = b.id);

In what order are MySQL JOINs evaluated?

  1. USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.

  2. SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.

    That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.

    More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.

  3. No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.

  4. The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.

Order of join operations: would these two FROM clauses produce the same results?

I have no idea if the 1st is equivalent to the 2nd (first because the queries are unfriendly formatted, to say the least and second because RIGHT joins are kind of confusing, since many are used to write using LEFT joins.) But to answer the question:

Is there a way to write the the first one so that no parenthesis are needed?

Yes, you can simply remove the parentheses from the 1st query.

Keeping the parentheses and formatted with some white space:

FROM 
SALESTAX
RIGHT JOIN
( ITEMS
RIGHT JOIN
( PINVOICE
INNER JOIN
PINVDET
ON PINVOICE.PNV_INVOICENO = PINVDET.PND_INVOICENO
AND PINVOICE.PNV_Site = PINVDET.PND_Site
)
ON ITEMS.ITE_INVNO = PINVDET.PND_INVNO
)
ON SALESTAX.STX_GroupID = PINVDET.PND_TAX1
FULL JOIN
( CUSTMS
RIGHT JOIN
CUSMER
ON CUSTMS.TMS_CODE = CUSMER.CUS_TERM
)
ON PINVDET.PND_CUSTID = CUSMER.CUS_CustID

Without parentheses and white-space formatted:

FROM    
SALESTAX
RIGHT JOIN
ITEMS
RIGHT JOIN
PINVOICE
INNER JOIN
PINVDET
ON PINVOICE.PNV_INVOICENO = PINVDET.PND_INVOICENO
AND PINVOICE.PNV_Site = PINVDET.PND_Site
ON ITEMS.ITE_INVNO = PINVDET.PND_INVNO
ON SALESTAX.STX_GroupID = PINVDET.PND_TAX1
FULL JOIN
CUSTMS
RIGHT JOIN
CUSMER
ON CUSTMS.TMS_CODE = CUSMER.CUS_TERM
ON PINVDET.PND_CUSTID = CUSMER.CUS_CustID

To answer the other question, about the 2nd query, no it isn't equivalent. You missed the table aliases and changed an inner join to left join. This is equivalent to the 1st:

FROM  CUSMER  
LEFT JOIN
CUSTMS ON CUSTMS.TMS_CODE = CUSMER.CUS_TERM
FULL JOIN
PINVDET
INNER JOIN -- this is changed
PINVOICE ON PINVOICE.PNV_INVOICENO = PINVDET.PND_INVOICENO
AND PINVOICE.PNV_Site = PINVDET.PND_Site
LEFT JOIN
ITEMS ON ITEMS.ITE_INVNO = PINVDET.PND_INVNO
LEFT JOIN
SALESTAX ON SALESTAX.STX_GroupID = PINVDET.PND_TAX1
ON PINVDET.PND_CUSTID = CUSMER.CUS_CustID

SQL JOIN - WHERE clause vs. ON clause

They are not the same thing.

Consider these queries:

SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345

and

SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345

The first will return an order and its lines, if any, for order number 12345. The second will return all orders, but only order 12345 will have any lines associated with it.

With an INNER JOIN, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.



Related Topics



Leave a reply



Submit