Why Use a Join Clause Versus a Where Condition

SQL JOIN - WHERE clause vs. ON clause

They are not the same thing.

Consider these queries:

SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345

and

SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345

The first will return an order and its lines, if any, for order number 12345. The second will return all orders, but only order 12345 will have any lines associated with it.

With an INNER JOIN, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.

Why use a JOIN clause versus a WHERE condition?

Some people will say that this style is less readable, but that's a matter of habit. From a performance point of view, it doesn't matter, since the query optimizer takes care of that.

Which performs first WHERE clause or JOIN clause

The conceptual order of query processing is:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:

CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))


;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))

INSERT INTO test1(name) SELECT 'a' FROM cte

Now run 2 queries:

SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1


SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1

Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:

1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)

2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)

This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.

INNER JOIN ON vs WHERE clause

INNER JOIN is ANSI syntax that you should use.

It is generally considered more readable, especially when you join lots of tables.

It can also be easily replaced with an OUTER JOIN whenever a need arises.

The WHERE syntax is more relational model oriented.

A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.

It's easier to see this with the WHERE syntax.

As for your example, in MySQL (and in SQL generally) these two queries are synonyms.

Also, note that MySQL also has a STRAIGHT_JOIN clause.

Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.

You cannot control this in MySQL using WHERE syntax.

WHERE Clause vs ON when using JOIN

No, the query optimizer is smart enough to choose the same execution plan for both examples.

You can use SHOWPLAN to check the execution plan.


Nevertheless, you should put all join connection on the ON clause and all the restrictions on the WHERE clause.

In MySQL queries, why use join instead of where?

Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:

  1. List the tables involved in a comma separated list in the FROM clause
  2. Write the association between the tables in the WHERE clause

    SELECT *
    FROM TABLE_A a,
    TABLE_B b
    WHERE a.id = b.id

Here's the query re-written using ANSI-92 JOIN syntax:

SELECT *
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id

From a Performance Perspective:


Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:

  • Ability to control JOIN order - the order which tables are scanned
  • Ability to apply filter criteria on a table prior to joining

From a Maintenance Perspective:


There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:

  • More readable, as the JOIN criteria is separate from the WHERE clause
  • Less likely to miss JOIN criteria
  • Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
  • WHERE clause only serves as filtration of the cartesian product of the tables joined

From a Design Perspective:


ANSI-92 JOIN syntax is pattern, not anti-pattern:

  • The purpose of the query is more obvious; the columns used by the application is clear
  • It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.

Conclusion


Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.

Difference between and and where in joins

Firstly there is a semantic difference. When you have a join, you are saying that the relationship between the two tables is defined by that condition. So in your first example you are saying that the tables are related by cd.Company = table2.Name AND table2.Id IN (2728). When you use the WHERE clause, you are saying that the relationship is defined by cd.Company = table2.Name and that you only want the rows where the condition table2.Id IN (2728) applies. Even though these give the same answer, it means very different things to a programmer reading your code.

In this case, the WHERE clause is almost certainly what you mean so you should use it.

Secondly there is actually difference in the result in the case that you use a LEFT JOIN instead of an INNER JOIN. If you include the second condition as part of the join, you will still get a result row if the condition fails - you will get values from the left table and nulls for the right table. If you include the condition as part of the WHERE clause and that condition fails, you won't get the row at all.

Here is an example to demonstrate this.

Query 1 (WHERE):

SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
WHERE table2.Id IN (2728);

Result:

field1
200

Query 2 (AND):

SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
AND table2.Id IN (2728);

Result:

field1
100
200

Test data used:

CREATE TABLE table1 (Company NVARCHAR(100) NOT NULL, Field1 INT NOT NULL);
INSERT INTO table1 (Company, Field1) VALUES
('FooSoft', 100),
('BarSoft', 200);

CREATE TABLE table2 (Id INT NOT NULL, Name NVARCHAR(100) NOT NULL);
INSERT INTO table2 (Id, Name) VALUES
(2727, 'FooSoft'),
(2728, 'BarSoft');

INNER JOIN condition in WHERE clause or ON clause?

For inner joins like this they are logically equivalent. However, you can run in to situations where a condition in the join clause means something different than a condition in the where clause.

As a simple illustration, imagine you do a left join like so;

select x.id
from x
left join y
on x.id = y.id
;

Here we're taking all the rows from x, regardless of whether there is a matching id in y. Now let's say our join condition grows - we're not just looking for matches in y based on the id but also on id_type.

select x.id
from x
left join y
on x.id = y.id
and y.id_type = 'some type'
;

Again this gives all the rows in x regardless of whether there is a matching (id, id_type) in y.

This is very different, though:

select x.id
from x
left join y
on x.id = y.id
where y.id_type = 'some type'
;

In this situation, we're picking all the rows of x and trying to match to rows from y. Now for rows for which there is no match in y, y.id_type will be null. Because of that, y.id_type = 'some type' isn't satisfied, so those rows where there is no match are discarded, which effectively turned this in to an inner join.

Long story short: for inner joins it doesn't matter where the conditions go but for outer joins it can.

Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail

Consider the following example. We have two tables, DEPARTMENTS and EMPLOYEES.

Some departments do not yet have any employees.

This query uses an inner join that finds the department employee 999 works at, if any, otherwise it shows nothing (not even the employee or his or her name):

select a.department_id, a.department_desc, b.employee_id, b.employee_name
from departments a
join employees b
on a.department_id = b.department_id
where b.employee_id = '999'

This next query uses an outer join (left between departments and employees) and finds the department that employee 999 works for. However it too will not show the employee's ID or his or her name, if they do not work at any departments. That is because of the outer joined table being used in the WHERE clause. If there is no matching department, it will be null (not 999, even though 999 exists in employees).

select a.department_id, a.department_desc, b.employee_id, b.employee_name
from departments a
left join employees b
on a.department_id = b.department_id
where b.employee_id = '999'

But consider this query:

select a.department_id, a.department_desc, b.employee_id, b.employee_name
from departments a
left join employees b
on a.department_id = b.department_id
and b.employee_id= '999'

Now the criteria is in the on clause. So even if this employee works at no departments, he will still be returned (his ID and name). The department columns will be null, but we get a result (the employee side).

You might think you would never want to use the outer joined table in the WHERE clause, but that is not necessarily the case. Normally it is, for the reason described above, though.

Suppose you want all departments with no employees. Then you could run the following, which does use an outer join, and the outer joined table is used in the where clause:

select a.department_id, a.department_desc, b.employee_id
from departments a
left join employees b
on a.department_id = b.department_id
where b.employee_id is null

^^ Shows departments with no employees.

The above is likely the only legitimate reason you would want to use an outer joined table in the WHERE clause rather than the ON clause (which I think is what your question is; the difference between inner and outer joins is an entirely different topic).

A good way to look at is this: You use outer joins to allow nulls. Why would you then use an outer join and say that a field should not be null and should be equal to 'XYZ'? If a value has to be 'XYZ' (not null), then why instruct the database to allow nulls to come back? It's like saying one thing and then overriding it later.



Related Topics



Leave a reply



Submit