Does SQL Join Order Affect Performance

Does Sql JOIN order affect performance?

No, the JOIN by order is changed during optimization.

The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.

does the order of condition in join affect query performance?

The order of conditions in the on clause should not affect performance. Why not? At a high level are three steps to SQL execution:

  1. Parse the query
  2. Construct and optimize the "executable" code
  3. Execute the code

The second level optimizes the query and should take into account different methods of executing the query. The join conditions are part of this optimization -- all at once.

In theory, it does not matter what the order of the joins are either, although in a very complex query, it could matter.

Does INNER JOIN performance depends on order of tables?

Aliases, and the order of the tables in the join (assuming it's INNER JOIN) doesn't affect the final outcome and thus doesn't affect performance since the order is replace (if needed) when the query is executed.

You can read some more basic concepts about relational algebra here:
http://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators

Does inner join order and where as an impact on performance

The Query Optimizer will almost always filter table c first before joining c to the other two tables. You can verify this by looking into the execution plan and see how many rows are being taken by SQL Server from table c to participate in the join.

About join order: the Query Optimizer will pick a join order that it thinks will work best for your query. It could be a JOIN b JOIN (filtered c) or (filtered c) JOIN a JOIN b.

If you want to force a certain order, include a hint:

SELECT      *
FROM a
INNER JOIN b ON ...
INNER JOIN c ON ...
WHERE c.id = 'x'
OPTION (FORCE ORDER)

This will force SQL Server to do a join b join (filtered c). Standard warning: unless you see massive performance gain, most times it's better to leave the join order to the Query Optimizer.

Does the order of JOIN vs WHERE in SQL affect performance?

Postgres has a smart optimizer so the two versions should have similar execution plans, under most cases (I'll return to that in a moment).

MySQL has a tendency to materialize subqueries. Although this has gotten better in more recent versions, I still recommend avoiding it. Materializing subqueries prevents the use of indexes and can have a significant impact on performance.

One caveat: If the subquery is complicated, then it might be better to filter as part of the subquery. For instance, if it is an aggregation, then filtering before aggregating usually results in better performance. That said, Postgres is smart about pushing conditions into the subquery. So, if the outer filtering is on a key used in aggregation, Postgres is smart enough to push the condition into the subquery.

Order of the tables in a JOIN

The join order seems to be changed for optimization by Spark.

There could be :

  • Reorder JOIN optimizer
  • Reorder JOIN optimizer - star schema
  • Reorder JOIN optimizer - cost based optimization

The following appears to shed some light on this topic:

https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer-star-schema/read
https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer/read
https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer-cost-based-optimization/read

Does the join order matter in SQL?

For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.


For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.

First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a

Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id

is equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id

but:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id

is not equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id

Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:

a LEFT JOIN b 
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition

This is equivalent to a LEFT JOIN (b LEFT JOIN c):

a LEFT JOIN  
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition

only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.

You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.

If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.



Related Topics



Leave a reply



Submit