SQL Filter Criteria in Join Criteria or Where Clause Which Is More Efficient

SQL Filter criteria in join criteria or where clause which is more efficient

I wouldn't use performance as the deciding factor here - and quite honestly, I don't think there's any measurable performance difference between those two cases, really.

I would always use case #2 - why? Because in my opinion, you should only put the actual criteria that establish the JOIN between the two tables into the JOIN clause - everything else belongs in the WHERE clause.

Just a matter of keeping things clean and put things where they belong, IMO.

Obviously, there are cases with LEFT OUTER JOINs where the placement of the criteria does make a difference in terms of what results get returned - those cases would be excluded from my recommendation, of course.

Marc

Which SQL query is faster? Filter on Join criteria or Where clause?

Performance-wise, they are the same (and produce the same plans)

Logically, you should make the operation that still has sense if you replace INNER JOIN with a LEFT JOIN.

In your very case this will look like this:

SELECT  *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
AND a.ID = 1
LEFT JOIN
TableB b
ON x.TableBID = b.ID

or this:

SELECT  *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
LEFT JOIN
TableB b
ON b.id = x.TableBID
WHERE a.id = 1

The former query will not return any actual matches for a.id other than 1, so the latter syntax (with WHERE) is logically more consistent.

Difference between filtering queries in JOIN and WHERE?

The answer is NO difference, but:

I will always prefer to do the following.

  • Always keep the Join Conditions in ON clause
  • Always put the filter's in where clause

This makes the query more readable.

So I will use this query:

SELECT value
FROM table1
INNER JOIN table2
ON table1.id = table2.id
WHERE table1.id = 1

However when you are using OUTER JOIN'S there is a big difference in keeping the filter in the ON condition and Where condition.

Logical Query Processing

The following list contains a general form of a query, along with step numbers assigned according to the order in which the different clauses are logically processed.

(5) SELECT (5-2) DISTINCT (5-3) TOP(<top_specification>) (5-1) <select_list>
(1) FROM (1-J) <left_table> <join_type> JOIN <right_table> ON <on_predicate>
| (1-A) <left_table> <apply_type> APPLY <right_table_expression> AS <alias>
| (1-P) <left_table> PIVOT(<pivot_specification>) AS <alias>
| (1-U) <left_table> UNPIVOT(<unpivot_specification>) AS <alias>
(2) WHERE <where_predicate>
(3) GROUP BY <group_by_specification>
(4) HAVING <having_predicate>
(6) ORDER BY <order_by_list>;

Flow diagram logical query processing

Sample Image

  • (1) FROM: The FROM phase identifies the query’s source tables and
    processes table operators. Each table operator applies a series of
    sub phases. For example, the phases involved in a join are (1-J1)
    Cartesian product, (1-J2) ON Filter, (1-J3) Add Outer Rows. The FROM
    phase generates virtual table VT1.

  • (1-J1) Cartesian Product: This phase performs a Cartesian product
    (cross join) between the two tables involved in the table operator,
    generating VT1-J1.

  • (1-J2) ON Filter: This phase filters the rows from VT1-J1 based on
    the predicate that appears in the ON clause (<on_predicate>). Only
    rows for which the predicate evaluates to TRUE are inserted into
    VT1-J2.
  • (1-J3) Add Outer Rows: If OUTER JOIN is specified (as opposed to
    CROSS JOIN or INNER JOIN), rows from the preserved table or tables
    for which a match was not found are added to the rows from VT1-J2 as
    outer rows, generating VT1-J3.
  • (2) WHERE: This phase filters the rows from VT1 based on the
    predicate that appears in the WHERE clause (). Only
    rows for which the predicate evaluates to TRUE are inserted into VT2.
  • (3) GROUP BY: This phase arranges the rows from VT2 in groups based
    on the column list specified in the GROUP BY clause, generating VT3.
    Ultimately, there will be one result row per group.
  • (4) HAVING: This phase filters the groups from VT3 based on the
    predicate that appears in the HAVING clause (<having_predicate>).
    Only groups for which the predicate evaluates to TRUE are inserted
    into VT4.
  • (5) SELECT: This phase processes the elements in the SELECT clause,
    generating VT5.
  • (5-1) Evaluate Expressions: This phase evaluates the expressions in
    the SELECT list, generating VT5-1.
  • (5-2) DISTINCT: This phase removes duplicate rows from VT5-1,
    generating VT5-2.
  • (5-3) TOP: This phase filters the specified top number or percentage
    of rows from VT5-2 based on the logical ordering defined by the ORDER
    BY clause, generating the table VT5-3.
  • (6) ORDER BY: This phase sorts the rows from VT5-3 according to the
    column list specified in the ORDER BY clause, generating the cursor
    VC6.

it is referred from book "T-SQL Querying (Developer Reference)"

WHERE Clause vs ON when using JOIN

No, the query optimizer is smart enough to choose the same execution plan for both examples.

You can use SHOWPLAN to check the execution plan.


Nevertheless, you should put all join connection on the ON clause and all the restrictions on the WHERE clause.

Which is more correct with SQL Functions, Join or in Where clause?

I usually use WHERE EXISTS instead of joining. See here for an explanation.

Is moving a constraint into a join more efficient than join and a where clause?

Not sure why you think scenario 2 would "logically" be more efficient. On an INNER JOIN everything is basically a filter so SQL Server can collapse the logic to the exact same underlying plan shape. Here's an example from AdventureWorks2012 (click to enlarge):

Sample Image

I prefer separating the join criteria from the filter criteria, so will always write the query in the format on the left. However @HLGEM makes a good point, these clauses are interchangeable in this case only because it's an INNER JOIN. For an OUTER JOIN, it is very important to place the filters on the outer table in the join criteria, else you unwittingly end up with an INNER JOIN and drastically change the semantics of the query. So my advice about how the plan can be collapsed only holds true for inner joins.

If you're worried about performance, I'd start by getting rid of SELECT * and only pulling the columns you actually need (and make sure there's a covering index).

Four months later, another answer has emerged claiming that there usually will be a difference in performance, and that putting filter criteria in the ON clause will be better. While I won't dispute that it is certainly plausible that this could happen, I contend that it certainly isn't the norm and shouldn't be something you use as an excuse to always put all filter criteria in the ON clause.

Which is more efficient, SQL join or concurrent SQL queries?

A DBMS is made to get the data you require quickly. It can easily deal with many tables in one query and still produce the results rather fast. It is unlikely that your own code will combine the tables faster than the DBMS (but it's still possible).

Your example, though, is a very simple one. As this is about one customer only, it is one row to retrieve from the customer table and only few rows from the address table. Not much to join there. You already get the data kind of ready to use.

One point even with these queries, though: When querying data from a database there is a dialogue taking place. Your app sends the SQL string to the DBMS. The DBMS parses the string, checks its validity, creates an access plan and then tells your app what columns (names and types) the query is going to return. Your app tells the DBMS "okay, send me the data" and the DBMS sends the first block of data. With two queries this dialogue (over some network possibly) takes place two times. This is called round trips, and this is something we want to reduce in order to get a good performance. Besides the parsing of the query string and building the execution plan also takes a little time in the DBMS, so one time is better than two times.

When it comes to more complex queries, the DBMS will try find the optimal execution plan (and usually be very good at that). Should it read a table sequentially or rather look up some rows via an index? Should it loop through the rows of one table and for each row find the matches in another table in order to join the tables? Or should it rather sort the two tables first and loop through both sorted lists then using a control break algorithm? Or should it create buckets and join the table on hashes? The DBMS usually has indexes available to help with looking up data, and sometimes it is a good idea to use them and sometimes not. The DBMS may even have statistics about your data, knowing the highest number of addresses per customer and the average number of addresses, too. All this is used to query your data in an optimal way.

Of course you know your data, too, usually. So you can also decide to use some hash mechanisms, binary searches, etc. when appropriate. Maybe one technique for one query and another technique for the next one. But this may require a lot of skill and be much work and may still be slower in the end.

But well, it is possible of course that you have a super fast computer on your desk and that database server is an old machine you got from your grandfather. Then it may be a good idea, just to retrieve the raw data and do the processing on your PC. As a rule of thumb, though: Let the DBMS do this for you. It will almost always be the faster approach.

Oracle SQL Query Filter in JOIN ON vs WHERE

There should be no difference. The optimizer should generate the same plan in both cases and should be able to apply the predicate before, after, or during the join in either case based on what is the most efficient approach for that particular query.

Of course, the fact that the optimizer can do something, in general, is no guarantee that the optimizer will actually do something in a particular query. As queries get more complicated, it becomes impossible to exhaustively consider every possible query plan which means that even with perfect information and perfect code, the optimizer may not have time to do everything that you'd like it to do. You'd need to take a look at the actual plans generated for the two queries to see if they are actually identical.



Related Topics



Leave a reply



Submit