Does Inner Join Performance Depends on Order of Tables

Does INNER JOIN performance depends on order of tables?

Aliases, and the order of the tables in the join (assuming it's INNER JOIN) doesn't affect the final outcome and thus doesn't affect performance since the order is replace (if needed) when the query is executed.

You can read some more basic concepts about relational algebra here:
http://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators

Does the join order matter in SQL?

For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.


For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.

First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a

Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id

is equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id

but:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id

is not equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id

Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:

a LEFT JOIN b 
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition

This is equivalent to a LEFT JOIN (b LEFT JOIN c):

a LEFT JOIN  
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition

only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.

You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.

If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.

Does Sql JOIN order affect performance?

No, the JOIN by order is changed during optimization.

The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.

does the order of condition in join affect query performance?

The order of conditions in the on clause should not affect performance. Why not? At a high level are three steps to SQL execution:

  1. Parse the query
  2. Construct and optimize the "executable" code
  3. Execute the code

The second level optimizes the query and should take into account different methods of executing the query. The join conditions are part of this optimization -- all at once.

In theory, it does not matter what the order of the joins are either, although in a very complex query, it could matter.

MySql In an inner join does it matter which table comes first?

Instead of the following:

select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId

You should consider formatting your JOIN in this format, using the WHERE clause, and proper capitalization:

SELECT p.postsTitle
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId

While it makes no difference (performance wise) to MySQL which order you put the tables in with INNER JOIN (MySQL treats them as equal and will optimize them the same way), it's convention to put the table that you are applying the WHERE clause to first. In fact, assuming proper indexes, MySQL will most likely start with the table that has the WHERE clause because it narrows down the result set, and MySQL likes to start with the set that has the fewest rows.

It's also convention to put the joined table's column first in the ON clause. It just reads more logically. While you're at it, use logical table aliases.

The only caveat is if you don't name your columns and instead use SELECT * like the following:

SELECT *
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId

You'll get the columns in the order they're listed in the query. In this case, you'll get the columns for bookmarks, followed by the columns for posts.

Most would say never use SELECT * in a production query, but if you really must return all columns, and you needed the columns from posts first, you could simply do the following:

SELECT p.*, b.*
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId

It's always good to be explicit about the returned result set.

Order of tables in INNER JOIN

So does it imply that if statistics gathered from database objects change, then results would also change?

No. The same query will always produce the same results (provided, of course, that the underlying data is the same). What the author is explaining is that the database may choose a strategy or another to process the query (starting from one table or another, using a this or that algorithm to join the rows, and so on). That decision is made based on many factors, some of them being based on information that is available in the statistics.

The key point is that SQL is a declarative language, not a procedural language: you don't get to chose how the database handles the query, you just tell it what result you want.

However, regardless of the algorithm that the database chooses, the result is guaranteed to be consistent.

Note that there are edge case where the database does not guarantee that results are the same for consecutive executions of the same query (like a query without a row limiting clause but without an order by): it's the responsibility of the client to provide a query whose results are properly defined (the language does gives you enough rope to hang yourself, if you really want to).

Should SQL JOINs be placed in particular order for performance reasons?

The documentation for MySQL states "The join optimizer calculates the order in which tables should be joined".

This order is determined based on information about the sizes of the tables and other factors, such as the presence of indexes.

You should put the joins in the order that makes the most sense for reading and maintaining the query.



Related Topics



Leave a reply



Submit