SQL Joins: Future of the SQL Ansi Standard (Where VS Join)

SQL Joins: Future of the SQL ANSI Standard (where vs join)?

I doubt that "where joins" would ever be unsupported. It's just not possible to not support them, because they are based on Cartesian products and simple filtering. They actually aren't joins.

But there are many reasons to use the newer join syntax. Among others:

  • Readability
  • Maintainability
  • Easier change to outer joins

SQL JOIN: is there a difference between USING, ON or WHERE?

There is no difference in performance.

However, the first style is ANSI-89 and will get your legs broken in some shops. Including mine. The second style is ANSI-92 and is much clearer.

Examples:

Which is the JOIN, which is the filter?

FROM T1,T2,T3....
WHERE T1.ID = T2.ID AND
T1.foo = 'bar' AND T2.fish = 42 AND
T1.ID = T3.ID

FROM T1
INNER JOIN T2 ON T1.ID = T2.ID
INNER JOIN T3 ON T1.ID = T3.ID
WHERE
T1.foo = 'bar' AND T2.fish = 42

If you have OUTER JOINs (=*, *=) then the 2nd style will work as advertised. The first most likely won't and is also deprecated in SQL Server 2005+

The ANSI-92 style is harder to bollix too. With the older style you can easily end up with a Cartesian product (cross join) if you miss a condition. You'll get a syntax error with ANSI-92.

Edit: Some more clarification

  • The reason for not using "join the where" (implicit) is the dodgy results with outer joins.
  • If you use explicit OUTER JOINs + implicit INNER JOINs you'll still get dodgy results + you have inconsistency in usage

It isn't just syntax: it's about having a semantically correct query

Edit, Dec 2011

SQL Server logical query processing order is FROM, ON, JOIN, WHERE...

So if you mix "implicit WHERE inner joins" and "explicit FROM outer joins" you most likely won't get expected results because the query is ambiguous...

Simple SQL Join Understanding?

The second version, with the explicit JOIN and join condition is standardized SQL.

The implicit join syntax with a WHERE clause is deprecated syntax (or, rather, considered bad) - partially because it is easy to forget the WHERE clause and cause a Cartesian product.

INNER JOIN ON vs WHERE clause

INNER JOIN is ANSI syntax that you should use.

It is generally considered more readable, especially when you join lots of tables.

It can also be easily replaced with an OUTER JOIN whenever a need arises.

The WHERE syntax is more relational model oriented.

A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.

It's easier to see this with the WHERE syntax.

As for your example, in MySQL (and in SQL generally) these two queries are synonyms.

Also, note that MySQL also has a STRAIGHT_JOIN clause.

Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.

You cannot control this in MySQL using WHERE syntax.

Oracle Joins - Comparison between conventional syntax VS ANSI Syntax

Grouping answers together

  1. Use explicit JOINs rather than implicit (regardless whether they are outer joins or not) is that it's much easier to accidently create a cartesian product with the implicit joins. With explicit JOINs you cannot "by accident" create one. The more tables are involved the higher the risk is that you miss one join condition.
  2. Basically (+) is severely limited compared to ANSI joins. Furthermore it is only available in Oracle whereas the ANSI join syntax is supported by all major DBMS
  3. SQL will not start to perform better after migration to ANSI syntax - it's just different syntax.
  4. Oracle strongly recommends that you use the more flexible FROM clause join syntax shown in the former example. In the past there were some bugs with ANSI syntax but if you go with latest 11.2 or 12.1 that should be fixed already.
  5. Using the JOIN operators ensure your SQL code is ANSI compliant, and thus would allow a front-end application to be more easily ported for other database platforms.
  6. Join conditions have a very low selectivity on each table and a high selectivity on the tuples in the theoretical cross product. Conditions in the where statement usually have a much higher selectivity.
  7. Oracle internally converts ANSI syntax to the (+) syntax, you can see this happening in the execution plan's Predicate Information section.

Possible Pitfall in using ANSI syntax on 12c engine

Including a possibility of bug in JOIN in 12c. See here

FOLLOW UP:

Quest SQL optimizer tool rewrites the SQL to ANSI syntax.

Is it better to do an equi join in the from clause or where clause

It's a style matter. Generally, you'd want to put the conditions that define the "shape" of the result set in the FROM clause (i.e. those that control which rows from each table should join together to produce a result), whereas those conditions which filter the result set should be in the WHERE clause. For INNER JOINs, the effects are equal, but once OUTER JOINs (LEFT, RIGHT) are involved, it feels a lot clearer.


In your first example, I'm left asking "what has this got to do with Table B?" when I encounter this odd condition in the JOIN. Whereas in the second, I can skip over the FROM clause (and all JOINs) if I'm not interested, and just see the conditions which determine whether rows are going to be returned in the WHERE clause.

Inner join vs Where

No! The same execution plan, look at these two tables:

CREATE TABLE table1 (
id INT,
name VARCHAR(20)
);

CREATE TABLE table2 (
id INT,
name VARCHAR(20)
);

The execution plan for the query using the inner join:

-- with inner join

EXPLAIN PLAN FOR
SELECT * FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id;

SELECT *
FROM TABLE (DBMS_XPLAN.DISPLAY);

-- 0 select statement
-- 1 hash join (access("T1"."ID"="T2"."ID"))
-- 2 table access full table1
-- 3 table access full table2

And the execution plan for the query using a WHERE clause.

-- with where clause

EXPLAIN PLAN FOR
SELECT * FROM table1 t1, table2 t2
WHERE t1.id = t2.id;

SELECT *
FROM TABLE (DBMS_XPLAN.DISPLAY);

-- 0 select statement
-- 1 hash join (access("T1"."ID"="T2"."ID"))
-- 2 table access full table1
-- 3 table access full table2

Joins with WHERE - splitting WHERE clauses

Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.

Sample Image

If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.

If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.

Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?

Filtering joins solely using WHERE can be extremely inefficient in some common scenarios. For example:

SELECT * FROM people p, companies c 
WHERE p.companyID = c.id AND p.firstName = 'Daniel'

Most databases will execute this query quite literally, first taking the Cartesian product of the people and companies tables and then filtering by those which have matching companyID and id fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.

A better approach is to group the constraints with the JOINs where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:

SELECT * FROM people p JOIN companies c ON p.companyID = c.id
WHERE p.firstName = 'Daniel'

It's a little longer, but the database is able to look at the ON clause and use it to compute the fully-constrained JOIN directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.

I change every query I see which uses the "comma JOIN" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.



Related Topics



Leave a reply



Submit