Best Practices for the Order of Joined Columns in a SQL Join

Best practices for the order of joined columns in a sql join?

The best practice here is to choose one and stick with it within the team. Personally, I prefer the FROM a JOIN b ON b.col = a.col because it seems cleaner to me.

Does the join order matter in SQL?

For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.


For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.

First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a

Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id

is equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id

but:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id

is not equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id

Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:

a LEFT JOIN b 
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition

This is equivalent to a LEFT JOIN (b LEFT JOIN c):

a LEFT JOIN  
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition

only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.

You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.

If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.

SQL join performance operation order

Some common optimization techniques to improve your queries are here:

  • Index the columns used in joining. If they are foreign keys, normally databases like MySql already index them.
  • Index the columns used in conditions or WHERE clause.
  • Avoid * and explicitly select the columns that you really need.
  • The order of joining in most of the cases won't matter, because DB-Engines are inteligent enough to decide that.

So its better to analyze your structure of both the joining tables, have indexes in place.

And if anyone is further intrested, how changing conditions order can help getting the better performance. I've a detailed answer over here mysql Slow query issue.

Is the order of joining tables indifferent as long as we chose proper join types?

In an inner join, the ordering of the tables in the join doesn't matter - the same rows will make up the result set regardless of the order they are in the join statement.

In either a left or right outer join, the order DOES matter. In A left join B, your result set will contain one row for every record in table A, irrespective of whether there is a matching row in table B. If there are non matching rows, this is likely to be a different result set to B left join A.

In a full outer join, the order again doesn't matter - rows will be produced for each row in each joined table no matter what their order.

Regarding A left join B vs B right join A - these will produce the same results. In simple cases with 2 tables, swapping the tables and changing the direction of the outer join will result in the same result set.

This will also apply to 3 or more tables if all of the outer joins are in the same direction - A left join B left join C will give the same set of results as C right join B right join A.

If you start mixing left and right joins, then you will need to start being more careful. There will almost always be a way to make an equivalent query with re-ordered tables, but at that point sub-queries or bracketing off expressions might be the best way to clarify what you are doing.

As another commenter states, using whatever makes your purpose most clear is usually the best option. The ordering of the tables in your query should make little or no difference performance wise, as the query optimiser should work this out (although the only way to be sure of this would be to check the execution plans for each option with your own queries and data).

Does Sql JOIN order affect performance?

No, the JOIN by order is changed during optimization.

The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.



Related Topics



Leave a reply



Submit