SQL Joins: Future of the SQL ANSI Standard (where vs join)?
I doubt that "where joins" would ever be unsupported. It's just not possible to not support them, because they are based on Cartesian products and simple filtering. They actually aren't joins.
But there are many reasons to use the newer join syntax. Among others:
- Readability
- Maintainability
- Easier change to outer joins
SQL JOIN: is there a difference between USING, ON or WHERE?
There is no difference in performance.
However, the first style is ANSI-89 and will get your legs broken in some shops. Including mine. The second style is ANSI-92 and is much clearer.
Examples:
Which is the JOIN, which is the filter?
FROM T1,T2,T3....
WHERE T1.ID = T2.ID AND
T1.foo = 'bar' AND T2.fish = 42 AND
T1.ID = T3.ID
FROM T1
INNER JOIN T2 ON T1.ID = T2.ID
INNER JOIN T3 ON T1.ID = T3.ID
WHERE
T1.foo = 'bar' AND T2.fish = 42
If you have OUTER JOINs (=*
, *=
) then the 2nd style will work as advertised. The first most likely won't and is also deprecated in SQL Server 2005+
The ANSI-92 style is harder to bollix too. With the older style you can easily end up with a Cartesian product (cross join) if you miss a condition. You'll get a syntax error with ANSI-92.
Edit: Some more clarification
- The reason for not using "join the where" (implicit) is the dodgy results with outer joins.
- If you use explicit OUTER JOINs + implicit INNER JOINs you'll still get dodgy results + you have inconsistency in usage
It isn't just syntax: it's about having a semantically correct query
Edit, Dec 2011
SQL Server logical query processing order is FROM, ON, JOIN, WHERE...
So if you mix "implicit WHERE inner joins" and "explicit FROM outer joins" you most likely won't get expected results because the query is ambiguous...
Simple SQL Join Understanding?
The second version, with the explicit JOIN
and join condition is standardized SQL.
The implicit join syntax with a WHERE
clause is deprecated syntax (or, rather, considered bad) - partially because it is easy to forget the WHERE
clause and cause a Cartesian product.
INNER JOIN ON vs WHERE clause
INNER JOIN
is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN
whenever a need arises.
The WHERE
syntax is more relational model oriented.
A result of two tables JOIN
ed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE
syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN
clause.
Using this clause, you can control the JOIN
order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE
syntax.
Oracle Joins - Comparison between conventional syntax VS ANSI Syntax
Grouping answers together
- Use explicit JOINs rather than implicit (regardless whether they are outer joins or not) is that it's much easier to accidently create a cartesian product with the implicit joins. With explicit JOINs you cannot "by accident" create one. The more tables are involved the higher the risk is that you miss one join condition.
- Basically (+) is severely limited compared to ANSI joins. Furthermore it is only available in Oracle whereas the ANSI join syntax is supported by all major DBMS
- SQL will not start to perform better after migration to ANSI syntax - it's just different syntax.
- Oracle strongly recommends that you use the more flexible FROM clause join syntax shown in the former example. In the past there were some bugs with ANSI syntax but if you go with latest 11.2 or 12.1 that should be fixed already.
- Using the JOIN operators ensure your SQL code is ANSI compliant, and thus would allow a front-end application to be more easily ported for other database platforms.
- Join conditions have a very low selectivity on each table and a high selectivity on the tuples in the theoretical cross product. Conditions in the where statement usually have a much higher selectivity.
- Oracle internally converts ANSI syntax to the (+) syntax, you can see this happening in the execution plan's Predicate Information section.
Possible Pitfall in using ANSI syntax on 12c engine
Including a possibility of bug in JOIN in 12c. See here
FOLLOW UP:
Quest SQL optimizer tool
rewrites the SQL to ANSI syntax.
Is it better to do an equi join in the from clause or where clause
It's a style matter. Generally, you'd want to put the conditions that define the "shape" of the result set in the FROM clause (i.e. those that control which rows from each table should join together to produce a result), whereas those conditions which filter the result set should be in the WHERE clause. For INNER JOINs, the effects are equal, but once OUTER JOINs (LEFT, RIGHT) are involved, it feels a lot clearer.
In your first example, I'm left asking "what has this got to do with Table B?" when I encounter this odd condition in the JOIN. Whereas in the second, I can skip over the FROM clause (and all JOINs) if I'm not interested, and just see the conditions which determine whether rows are going to be returned in the WHERE clause.
Inner join vs Where
No! The same execution plan, look at these two tables:
CREATE TABLE table1 (
id INT,
name VARCHAR(20)
);
CREATE TABLE table2 (
id INT,
name VARCHAR(20)
);
The execution plan for the query using the inner join:
-- with inner join
EXPLAIN PLAN FOR
SELECT * FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id;
SELECT *
FROM TABLE (DBMS_XPLAN.DISPLAY);
-- 0 select statement
-- 1 hash join (access("T1"."ID"="T2"."ID"))
-- 2 table access full table1
-- 3 table access full table2
And the execution plan for the query using a WHERE clause.
-- with where clause
EXPLAIN PLAN FOR
SELECT * FROM table1 t1, table2 t2
WHERE t1.id = t2.id;
SELECT *
FROM TABLE (DBMS_XPLAN.DISPLAY);
-- 0 select statement
-- 1 hash join (access("T1"."ID"="T2"."ID"))
-- 2 table access full table1
-- 3 table access full table2
Joins with WHERE - splitting WHERE clauses
Think of JOIN
s as a cross-product of two tables, which is filtered using the conditions specified in the ON
clause. Your WHERE
clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE
on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN
again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOIN
s compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.
Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?
Filtering joins solely using WHERE
can be extremely inefficient in some common scenarios. For example:
SELECT * FROM people p, companies c
WHERE p.companyID = c.id AND p.firstName = 'Daniel'
Most databases will execute this query quite literally, first taking the Cartesian product of the people
and companies
tables and then filtering by those which have matching companyID
and id
fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.
A better approach is to group the constraints with the JOIN
s where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:
SELECT * FROM people p JOIN companies c ON p.companyID = c.id
WHERE p.firstName = 'Daniel'
It's a little longer, but the database is able to look at the ON
clause and use it to compute the fully-constrained JOIN
directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.
I change every query I see which uses the "comma JOIN
" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.
Related Topics
Extracting Several Math Operations Outputs from Single Select Query
Postgresql:JSON Array to Rows Using Lateral Join
How to Find 11Th Entry in SQL Access Database Table
MySQL Deadlock Explanation Needed
Compare Strings Ignoring Accents in SQL (Oracle)
Calculate the Last Day of the Prior Quarter
Format a Number with Commas But Without Decimals in SQL Server 2008 R2
How to Emulate Repeat() in SQLite
Row_Number Simulation in SQL Server 2000
To Prevent the Use of Duplicate Tags in a Database
Calculate the Sum of the Column Which Has Time Datatype:
Transpose Rows into Columns in SQL Server 2008 R2
SQL Server:Pivot with Custom Column Names
What Does It Mean to Have Jobs with a Null Stop Date
Ssis - Performing a Lookup on Another Table to Get Related Column