SQL Join - Where Clause Vs. on Clause

SQL JOIN - WHERE clause vs. ON clause

They are not the same thing.

Consider these queries:

SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345

and

SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345

The first will return an order and its lines, if any, for order number 12345. The second will return all orders, but only order 12345 will have any lines associated with it.

With an INNER JOIN, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.

WHERE Clause vs ON when using JOIN

No, the query optimizer is smart enough to choose the same execution plan for both examples.

You can use SHOWPLAN to check the execution plan.


Nevertheless, you should put all join connection on the ON clause and all the restrictions on the WHERE clause.

INNER JOIN ON vs WHERE clause

INNER JOIN is ANSI syntax that you should use.

It is generally considered more readable, especially when you join lots of tables.

It can also be easily replaced with an OUTER JOIN whenever a need arises.

The WHERE syntax is more relational model oriented.

A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.

It's easier to see this with the WHERE syntax.

As for your example, in MySQL (and in SQL generally) these two queries are synonyms.

Also, note that MySQL also has a STRAIGHT_JOIN clause.

Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.

You cannot control this in MySQL using WHERE syntax.

INNER JOIN condition in WHERE clause or ON clause?

For inner joins like this they are logically equivalent. However, you can run in to situations where a condition in the join clause means something different than a condition in the where clause.

As a simple illustration, imagine you do a left join like so;

select x.id
from x
left join y
on x.id = y.id
;

Here we're taking all the rows from x, regardless of whether there is a matching id in y. Now let's say our join condition grows - we're not just looking for matches in y based on the id but also on id_type.

select x.id
from x
left join y
on x.id = y.id
and y.id_type = 'some type'
;

Again this gives all the rows in x regardless of whether there is a matching (id, id_type) in y.

This is very different, though:

select x.id
from x
left join y
on x.id = y.id
where y.id_type = 'some type'
;

In this situation, we're picking all the rows of x and trying to match to rows from y. Now for rows for which there is no match in y, y.id_type will be null. Because of that, y.id_type = 'some type' isn't satisfied, so those rows where there is no match are discarded, which effectively turned this in to an inner join.

Long story short: for inner joins it doesn't matter where the conditions go but for outer joins it can.

SQL left join with filter in JOIN condition vs filter in WHERE clause

The big difference with the Where condition b.status is null or b.status in (10, 100)
is when b.status is say 1 as well as b.id=a.id

In the first query you will still get the row from table A with corresponding B part as NULL as On condition is not fully satisfied.
In the second query you will get the row in the JOIN for both a and b tables which will be lost in the where clause.

Which performs first WHERE clause or JOIN clause

The conceptual order of query processing is:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:

CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))


;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))

INSERT INTO test1(name) SELECT 'a' FROM cte

Now run 2 queries:

SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1


SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1

Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:

1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)

2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)

This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.

What's the difference between where clause and on clause when table left join?

The where clause applies to the whole resultset; the on clause only applies to the join in question.

In the example supplied, all of the additional conditions related to fields on the inner side of the join - so in this example, the two queries are effectively identical.

However, if you had included a condition on a value in the table in the outer side of the join, it would have made a significant difference.

You can get more from this link: http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause

For example:

select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 and t2.f4=1

select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 where t2.f4=1

- do different things - the former will left join to t2 records where f4 is 1, while the latter has effectively been turned back into an inner join to t2.

Why and when a LEFT JOIN with condition in WHERE clause is not equivalent to the same LEFT JOIN in ON?

The on clause is used when the join is looking for matching rows. The where clause is used to filter rows after all the joining is done.

An example with Disney toons voting for president:

declare @candidates table (name varchar(50));
insert @candidates values
('Obama'),
('Romney');
declare @votes table (voter varchar(50), voted_for varchar(50));
insert @votes values
('Mickey Mouse', 'Romney'),
('Donald Duck', 'Obama');

select *
from @candidates c
left join
@votes v
on c.name = v.voted_for
and v.voter = 'Donald Duck'

This still returns Romney even though Donald didn't vote for him. If you move the condition from the on to the where clause:

select  *
from @candidates c
left join
@votes v
on c.name = v.voted_for
where v.voter = 'Donald Duck'

Romney will no longer be in the result set.

Where vs ON in outer join

As Gordon Lindolf pointed out it's not true, Your friend is plain wrong.

I want just to add developers like to think SQL like they think their language of trade (C++, VB, Java), but those are procedural/imperative languages.
When you code SQL you are in another paradigm. You are just describing a function to be applied to a dataset.

Let's get your own example:

Select a.*, b.* 
From A a
Left outer join B on a.id = b.id
Where b.id is NULL;

If a.Id and b.Id are not null columns.

It's semantically equal to

Select a.*, null, ..., null
From A a
where not exists (select * from B b where b.Id = a.Id)

Now try to run those to queries and profile.
In most DBMS I can expect both queries to run in the exact same way.

It happens because the engine decides how to implement your "function" over the dataset.

Note the above example is the equivalent in set mathematics to:

Give me the set A minus the intersection between A and B.

Engines can decide how to implement your query because they have some tricks under its sleeve.
It has metrics about your tables, indexes, etc and can use it to, for example, "make a join" in a diferent order you wrote it.

IMHO engines today are really good at finding the best way to implement the function you describe and rarely needs query hints.
Of course you can end describing your funciton in a way too complicated, affecting how the engines decides to run it.
The art of better describing functions and sets and managins indexes is what we call query tunning.

inner join on condition or using where?

SQL is not a procedural language. It is a descriptive language. A query describes the result set that you want to produce.

With an inner join, the two queries in your question are identical -- they produce the same result set under all circumstances. Which to prefer is a stylistic preference. MySQL should treat the two the same way from an optimization perspective.

One preference is that filters on a single table are more appropriate for WHERE and ON.

With an outer join, the two queries are not the same, and you should use the one that expresses your intent.



Related Topics



Leave a reply



Submit