Is There Something Wrong With Joins That Don't Use the Join Keyword in SQL or MySQL

Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?

Filtering joins solely using WHERE can be extremely inefficient in some common scenarios. For example:

SELECT * FROM people p, companies c 
WHERE p.companyID = c.id AND p.firstName = 'Daniel'

Most databases will execute this query quite literally, first taking the Cartesian product of the people and companies tables and then filtering by those which have matching companyID and id fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.

A better approach is to group the constraints with the JOINs where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:

SELECT * FROM people p JOIN companies c ON p.companyID = c.id
WHERE p.firstName = 'Daniel'

It's a little longer, but the database is able to look at the ON clause and use it to compute the fully-constrained JOIN directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.

I change every query I see which uses the "comma JOIN" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.

In MySQL queries, why use join instead of where?

Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:

  1. List the tables involved in a comma separated list in the FROM clause
  2. Write the association between the tables in the WHERE clause

    SELECT *
    FROM TABLE_A a,
    TABLE_B b
    WHERE a.id = b.id

Here's the query re-written using ANSI-92 JOIN syntax:

SELECT *
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id

From a Performance Perspective:


Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:

  • Ability to control JOIN order - the order which tables are scanned
  • Ability to apply filter criteria on a table prior to joining

From a Maintenance Perspective:


There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:

  • More readable, as the JOIN criteria is separate from the WHERE clause
  • Less likely to miss JOIN criteria
  • Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
  • WHERE clause only serves as filtration of the cartesian product of the tables joined

From a Design Perspective:


ANSI-92 JOIN syntax is pattern, not anti-pattern:

  • The purpose of the query is more obvious; the columns used by the application is clear
  • It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.

Conclusion


Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.

How to implement SQL joins without using JOIN?

The basic INNER JOIN is easy to implement.
The following:

SELECT L.XCol, R.YCol
FROM LeftTable AS L
INNER JOIN RightTable AS R
ON L.IDCol=R.IDCol;

is equivalent to:

SELECT L.XCol, R.YCol
FROM LeftTable AS L, RightTable AS R
WHERE L.IDCol=R.IDCol;

In order to extend this to a LEFT/RIGHT/FULL OUTER JOIN, you only need to UNION the rows with no match, along with NULL in the correct columns, to the previous INNER JOIN.

For a LEFT OUTER JOIN, add:

UNION ALL
SELECT L.XCol, NULL /* cast the NULL as needed */
FROM LeftTable AS L
WHERE NOT EXISTS (
SELECT * FROM RightTable AS R
WHERE L.IDCol=R.IDCol)

For a RIGHT OUTER JOIN, add:

UNION ALL
SELECT NULL, R.YCol /* cast the NULL as needed */
FROM RightTable AS R
WHERE NOT EXISTS (
SELECT * FROM LeftTable AS L
WHERE L.IDCol=R.IDCol)

For a FULL OUTER JOIN, add both of the above.

What's the difference between comma separated joins and join on syntax in MySQL?

There is no difference at all.

First representation makes query more readable and makes it look very clear as to which join corresponds to which condition.

SQL Join Issue, Doing Something Really Wrong With Self Joins?

Your problem is that when you say , Products.Subscriptions AS SubscriptionsB you're doing a cross join, so it's going to output NxN rows where N is the total number of subscriptions. I doubt you need that join at all, since you don't appear to be using any values from that instance i.e this should work just fine:

SELECT ZoneA.ZoneDescription
,TrackerA.Zone
,MasterListA.Name
,SubscriptionsA.ActivationDate
,SubscriptionsA.TerminationDate
,ParentA.ProductParentType
,MONTH(SubscriptionsA.ActivationDate) AS ActivationMonth
,MONTH(SubscriptionsA.TerminationDate) AS TerminationMonth
,QuartersA.CompanyFiscalQuarter AS ActivationQuarter
,QuartersB.CompanyFiscalQuarter AS TerminationQuarter
FROM BalanceTracker.Zone AS ZoneA INNER JOIN
BalanceTracker.TrackerAccounts AS TrackerA ON ZoneA.ZoneID = TrackerA.Zone INNER JOIN
BalanceTracker.SweepAccounts AS SweepA ON TrackerA.TrackedAccount = SweepA.SweepAccount INNER JOIN
Fed.MasterAccountList AS MasterListA ON SweepA.MasterAccountListID = MasterListA.MasterAccountListID INNER JOIN
Products.Subscriptions AS SubscriptionsA ON MasterListA.MasterAccountListID = SubscriptionsA.SnlId INNER JOIN
Products.ProductParent AS ParentA ON SubscriptionsA.ProductCode = ParentA.ProductCode INNER JOIN
Calendar.Quarters AS QuartersA ON SubscriptionsA.ActivationMonth = QuartersA.MonthNumber LEFT JOIN
Calendar.Quarters AS QuartersB ON SubscriptionsA.TerminationMonth = QuartersB.CompanyFiscalQuarter
ORDER BY TrackerA.Zone, SubscriptionsA.ActivationDate

What are JOINs in SQL (for)?

You've probably been using it without knowing it, if you've ever queried more than one table at a time.

If you do a query like

SELECT comment_text
FROM users, comments
WHERE users.user_id = 'sabwufer'
AND comments.user_id = users.user_id <-- this line is your JOIN condition

Then you are, in fact, doing a join (an INNER JOIN) even though you aren't using the JOIN keyword.

(Aside: the query above is the equivalent of

SELECT comment_text
FROM users
JOIN comments ON (comments.user_id = users.user_id)
WHERE users.user_id = 'sabwufer'

)

If all you do are INNER JOINS, then you don't need to use the JOIN keyword, though it can make what you're doing more clear.

However, there are other useful types of joins (see the tutorial XpiritO linked, for example), and they are worth understanding.

SQL JOIN and different types of JOINs

What is SQL JOIN ?

SQL JOIN is a method to retrieve data from two or more database tables.

What are the different SQL JOINs ?

There are a total of five JOINs. They are :

  1. JOIN or INNER JOIN
2. OUTER JOIN

2.1 LEFT OUTER JOIN or LEFT JOIN
2.2 RIGHT OUTER JOIN or RIGHT JOIN
2.3 FULL OUTER JOIN or FULL JOIN

3. NATURAL JOIN
4. CROSS JOIN
5. SELF JOIN

1. JOIN or INNER JOIN :

In this kind of a JOIN, we get all records that match the condition in both tables, and records in both tables that do not match are not reported.

In other words, INNER JOIN is based on the single fact that: ONLY the matching entries in BOTH the tables SHOULD be listed.

Note that a JOIN without any other JOIN keywords (like INNER, OUTER, LEFT, etc) is an INNER JOIN. In other words, JOIN is
a Syntactic sugar for INNER JOIN (see: Difference between JOIN and INNER JOIN).

2. OUTER JOIN :

OUTER JOIN retrieves

Either,
the matched rows from one table and all rows in the other table
Or,
all rows in all tables (it doesn't matter whether or not there is a match).

There are three kinds of Outer Join :

2.1 LEFT OUTER JOIN or LEFT JOIN

This join returns all the rows from the left table in conjunction with the matching rows from the
right table. If there are no columns matching in the right table, it returns NULL values.

2.2 RIGHT OUTER JOIN or RIGHT JOIN

This JOIN returns all the rows from the right table in conjunction with the matching rows from the
left table. If there are no columns matching in the left table, it returns NULL values.

2.3 FULL OUTER JOIN or FULL JOIN

This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It returns rows from either table when the conditions are met and returns NULL value when there is no match.

In other words, OUTER JOIN is based on the fact that: ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.

Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.

3. NATURAL JOIN :

It is based on the two conditions :

  1. the JOIN is made on all the columns with the same name for equality.
  2. Removes duplicate columns from the result.

This seems to be more of theoretical in nature and as a result (probably) most DBMS
don't even bother supporting this.

4. CROSS JOIN :

It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not make sense
in most of the situations. Moreover, we won't need this at all (or needs the least, to be precise).

5. SELF JOIN :

It is not a different form of JOIN, rather it is a JOIN (INNER, OUTER, etc) of a table to itself.

JOINs based on Operators

Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are

  1. Equi JOIN
  2. Theta JOIN

1. Equi JOIN :

For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then we say that
the JOIN is an EQUI JOIN.

2. Theta JOIN :

This is same as EQUI JOIN but it allows all other operators like >, <, >= etc.

Many consider both EQUI JOIN and Theta JOIN similar to INNER, OUTER
etc JOINs. But I strongly believe that its a mistake and makes the
ideas vague. Because INNER JOIN, OUTER JOIN etc are all connected with
the tables and their data whereas EQUI JOIN and THETA JOIN are only
connected with the operators we use in the former.

Again, there are many who consider NATURAL JOIN as some sort of
"peculiar" EQUI JOIN. In fact, it is true, because of the first
condition I mentioned for NATURAL JOIN. However, we don't have to
restrict that simply to NATURAL JOINs alone. INNER JOINs, OUTER JOINs
etc could be an EQUI JOIN too.

Does the join order matter in SQL?

For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.


For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.

First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a

Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id

is equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id

but:

a LEFT JOIN b 
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id

is not equivalent to:

a LEFT JOIN c 
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id

Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:

a LEFT JOIN b 
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition

This is equivalent to a LEFT JOIN (b LEFT JOIN c):

a LEFT JOIN  
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition

only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.

You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.

If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.



Related Topics



Leave a reply



Submit