Condition within JOIN or WHERE
The relational algebra allows interchangeability of the predicates in the WHERE
clause and the INNER JOIN
, so even INNER JOIN
queries with WHERE
clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN
process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN
relatively "incomplete" and putting some of the criteria in the WHERE
simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
INNER JOIN condition in WHERE clause or ON clause?
For inner joins like this they are logically equivalent. However, you can run in to situations where a condition in the join clause means something different than a condition in the where clause.
As a simple illustration, imagine you do a left join like so;
select x.id
from x
left join y
on x.id = y.id
;
Here we're taking all the rows from x, regardless of whether there is a matching id in y. Now let's say our join condition grows - we're not just looking for matches in y based on the id but also on id_type.
select x.id
from x
left join y
on x.id = y.id
and y.id_type = 'some type'
;
Again this gives all the rows in x regardless of whether there is a matching (id, id_type) in y.
This is very different, though:
select x.id
from x
left join y
on x.id = y.id
where y.id_type = 'some type'
;
In this situation, we're picking all the rows of x and trying to match to rows from y. Now for rows for which there is no match in y, y.id_type will be null. Because of that, y.id_type = 'some type' isn't satisfied, so those rows where there is no match are discarded, which effectively turned this in to an inner join.
Long story short: for inner joins it doesn't matter where the conditions go but for outer joins it can.
SQL JOIN - WHERE clause vs. ON clause
They are not the same thing.
Consider these queries:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345
and
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345
The first will return an order and its lines, if any, for order number 12345
. The second will return all orders, but only order 12345
will have any lines associated with it.
With an INNER JOIN
, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.
inner join on condition or using where?
SQL is not a procedural language. It is a descriptive language. A query describes the result set that you want to produce.
With an inner join, the two queries in your question are identical -- they produce the same result set under all circumstances. Which to prefer is a stylistic preference. MySQL should treat the two the same way from an optimization perspective.
One preference is that filters on a single table are more appropriate for WHERE
and ON
.
With an outer join, the two queries are not the same, and you should use the one that expresses your intent.
SQL LEFT JOIN: difference between WHERE and condition inside AND
with a left join
there is a difference
with condition on left join
rows with column > 10
will be there filled with nulls
with where
condition rows will be filtered out
with a inner join
there is no difference
example:
declare @t table (id int, dummy varchar(20))
declare @a table (id int, age int, col int)
insert into @t
select * from (
values
(1, 'pippo' ),
(2, 'pluto' ),
(3, 'paperino' ),
(4, 'ciccio' ),
(5, 'caio' ),
(5, 'sempronio')
) x (c1,c2)
insert into @a
select * from (
values
(1, 38, 2 ),
(2, 26, 5 ),
(3, 41, 12),
(4, 15, 11),
(5, 39, 7 )
) x (c1,c2,c3)
select t.*, a.age
from @t t
left join @a a on t.ID = a.ID and a.col > 10
Outputs:
id dummy age
1 pippo NULL
2 pluto NULL
3 paperino 41
4 ciccio 15
5 caio NULL
5 sempronio NULL
While
select t.*, a.age
from @t t
left join @a a on t.ID = a.ID
where a.col > 10
Outputs:
id dummy age
3 paperino 41
4 ciccio 15
So with LEFT JOIN
you will get ALWAYS all the rows from 1st table
If the join condition is true, you will get columns from joined table filled with their values, if the condition is false their columns will be NULL
With WHERE
condition you will get only the rows that match the condition.
left join and where condition in joining condition
You should not use column related to left table in where condition (this work as a INNER JOIN) move the condition for left join in the related ON clause
select *
FROM table1 t1
left join table2 t2
ON t1.id = t2.fk_id AND t2.id_number = 12174
WHERE t1.code = 'CODE1' ;
The where condition is the equivalent part of the INNER JOIN clause this is the reason that you have this behavior..
adding the condition to the on clause mean that also the added condition work as an outer join ..
SQL left join with filter in JOIN condition vs filter in WHERE clause
The big difference with the Where condition b.status is null or b.status in (10, 100)
is when b.status is say 1 as well as b.id=a.id
In the first query you will still get the row from table A with corresponding B part as NULL as On condition is not fully satisfied.
In the second query you will get the row in the JOIN for both a and b tables which will be lost in the where clause.
Which performs first WHERE clause or JOIN clause
The conceptual order of query processing is:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:
CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))
;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))
INSERT INTO test1(name) SELECT 'a' FROM cte
Now run 2 queries:
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1
Notice that the first query will filter most rows out in the join
condition, but the second query filters in the where
condition. Look at the produced plans:
1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)
2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)
This means that in the first query optimized, the engine decided first to evaluate the join
condition to filter out rows. In the second query, it evaluated the where
clause first.
Related Topics
Are Postgresql Column Names Case-Sensitive
SQL Query Return Data from Multiple Tables
SQL Split Values to Multiple Rows
Insert into ... Values ( Select ... from ... )
SQL Join and Different Types of Joins
Get Records With Max Value For Each Group of Grouped SQL Results
Difference Between Lateral Join and a Subquery in Postgresql
How to Insert Multiple Rows At a Time in an Sqlite Database
MySQL How to Fill Missing Dates in Range
Identity Increment Is Jumping in SQL Server Database
How to Request a Random Row in Sql
Simple Way to Transpose Columns and Rows in Sql
What Is the Reason Not to Use Select *