Inner Join VS Multiple Table Names in "From"

INNER JOIN vs multiple table names in FROM

There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.

SQL left join vs multiple tables on FROM line?

The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.

It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.

Let me give you an example.

Let's suppose you have 3 tables in your system:

Company
Department
Employee

Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.

Ok, so now you want to do the following:

List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.

So you do this:

SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID

Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.

Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.

If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.

The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.

And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.

On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.

So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.

Enter the new syntax, with this you can choose.

For instance, if you want all companies, as the problem description stated, this is what you would write:

SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID

Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.

Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:

SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'

This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.

This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).

Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.

So there you have it.

LEFT and INNER JOIN is the wave of the future.

JOIN vs Multiple FROM Tables

An INNER JOIN (as in your first example) will always return the same data as your a cartesian join with a WHERE filter that uses the same join criteria (your second example).

However, note that this is not true for OUTER JOINs, where NULL values are filtered out in a cartesian join with a WHERE filter as join criteria.

What is the difference between CROSS JOIN and multiple tables in one FROM?

The first with the comma is an old style from the previous century.

The second with the CROSS JOIN is in newer ANSI JOIN syntax.

And those 2 queries will indeed give the same results.

They both link every record of table "a" against every record of table "b".

So if table "a" has 10 rows, and table "b" has 100 rows.

Then the result would be 10 * 100 = 1000 records.

But why does that first outdated style still exists in some DBMS?

Mostly for backward compatibility reasons, so that some older SQL's don't suddenly break.

Most SQL specialists these days would frown upon someone who still uses that outdated old comma syntax. (although it's often forgiven for an intentional cartesian product)

A CROSS JOIN is a cartesian product JOIN that's lacking the ON clause that defines the relationship between the 2 tables.

In the ANSI JOIN syntax there are also the OUTER joins: LEFT JOIN, RIGHT JOIN, FULL JOIN

And the normal JOIN, aka the INNER JOIN.

Sample Image

But those normally require the ON clause, while a CROSS JOIN doesn't.

And example of a query using different JOIN types.

SELECT *
FROM jars
JOIN apples ON apples.jar_id = jars.id
LEFT JOIN peaches ON peaches.jar_id = jars.id
CROSS JOIN bananas AS bnns
RIGHT JOIN crates ON crates.id = jars.crate_id
FULL JOIN nuts ON nuts.jar_id = jars.id
WHERE jars.name = 'FruityMix'

The nice thing about the JOIN syntax is that the link criteria and the search criteria are separated.

While in the old comma style that difference would be harder to notice. Hence it's easier to forget a link criteria.

SELECT *
FROM crates, jars, apples, peaches, bananas, nuts
WHERE apples.jar_id = jars.id
AND jars.name = 'NuttyFruitBomb'
AND peaches.jar_id = jars.id(+)
AND crates.id(+) = jar.crate_id;

Did you notice that the first query has 1 cartesian product join, but the second has 2? That's why the 2nd is rather nutty.

Multiple Table Select vs. JOIN (performance)

They are the same, but with a different syntax. So you shouldn't expect any performance difference between the two syntaxes. However the the last syntax(ANS SQL-92 syntax) is the recommended, see these for more details:

  • Bad habits to kick : using old-style JOINs.
  • SQL JOIN: is there a difference between USING, ON or WHERE?

1 table query vs join multiple tables query performance

The first method will usually be faster for reads, and the second one will help you maintain data integrity and usually will be faster for writes.

The transition from the later form to the former is called denormalization and is usually used in data warehouses, while operational ("live") databases usually prefer the later form (second method).

JOIN queries vs multiple queries

This is way too vague to give you an answer relevant to your specific case. It depends on a lot of things. Jeff Atwood (founder of this site) actually wrote about this. For the most part, though, if you have the right indexes and you properly do your JOINs it is usually going to be faster to do 1 trip than several.

MySQL --- Explicit INNER JOIN with selection from multiple tables

You can simply replace the , in your implicit join with the word JOIN:

SELECT Name, Language, Percentage
FROM Country
JOIN CountryLanguage
WHERE Code = CountryCode

and the query will work fine. You can also replace WHERE with ON and again it will work fine. Finally if you want to explicitly name the tables where the columns come from (and this is the preferred approach), you would use:

SELECT c.Name, cl.Language, cl.Percentage
FROM Country c
JOIN CountryLanguage cl
ON c.Code = cl.CountryCode

MySql JOIN syntax with multiple tables

The parameters to the FROM keyword are what the MySQL documentation calls table_references, and its syntax is described here. There's lots of recursive references in the syntax, and I think this is what allows that syntax. I've copied what I think are relevant excerpts from the BNF.

table_references:
escaped_table_reference [, escaped_table_reference] ...

escaped_table_reference: {
table_reference
| { OJ table_reference }
}

table_reference: {
table_factor
| joined_table
}

joined_table: {
table_reference {[INNER | CROSS] JOIN | STRAIGHT_JOIN} table_factor [join_specification]
| table_reference {LEFT|RIGHT} [OUTER] JOIN table_reference join_specification
| table_reference NATURAL [INNER | {LEFT|RIGHT} [OUTER]] JOIN table_factor
}

The nested

    Person_Fear
INNER JOIN Fears
ON Person_Fear.FearID = Fears.FearID

is a joined_table, which can be used as the table_reference in the first LEFT JOIN.



Related Topics



Leave a reply



Submit