In MySQL queries, why use join instead of where?
Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:
- List the tables involved in a comma separated list in the FROM clause
Write the association between the tables in the WHERE clause
SELECT *
FROM TABLE_A a,
TABLE_B b
WHERE a.id = b.id
Here's the query re-written using ANSI-92 JOIN syntax:
SELECT *
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
From a Performance Perspective:
Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:
- Ability to control JOIN order - the order which tables are scanned
- Ability to apply filter criteria on a table prior to joining
From a Maintenance Perspective:
There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:
- More readable, as the JOIN criteria is separate from the WHERE clause
- Less likely to miss JOIN criteria
- Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
- WHERE clause only serves as filtration of the cartesian product of the tables joined
From a Design Perspective:
ANSI-92 JOIN syntax is pattern, not anti-pattern:
- The purpose of the query is more obvious; the columns used by the application is clear
- It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.
Conclusion
Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.
Use JOIN instead of WHERE OR... IN sub query
The OR operations are probably going to mess with your speed the most, OR's kill index performance.
This converts it to the most succinct join equivalent:
SELECT t1.id
FROM t1 INNER JOIN t2 ON t2.id IN (t1.id2, t1.id3, t1.id4)
WHERE t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
;
IN is logically an OR, but I think more recent MySQL versions have optimized it a bit; if not, this might have better performance:
SELECT t1.id
FROM t1
LEFT JOIN t2 AS t2_2 ON t2_2.id = t1.id2
LEFT JOIN t2 AS t2_3 ON t2_3.id = t1.id3
LEFT JOIN t2 AS t2_4 ON t2_4.id = t1.id4
WHERE (t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
)
AND (t2_2.id IS NOT NULL OR t2_3.id IS NOT NULL OR t2_4.id IS NOT NULL)
;
It avoids the OR's on the id fields, so may take advantage of indexing on those fields; but JOINs three times as much, so could still end up being slower.
MySQL join with where clause
You need to put it in the join
clause, not the where
:
SELECT *
FROM categories
LEFT JOIN user_category_subscriptions ON
user_category_subscriptions.category_id = categories.category_id
and user_category_subscriptions.user_id =1
See, with an inner join
, putting a clause in the join
or the where
is equivalent. However, with an outer join
, they are vastly different.
As a join
condition, you specify the rowset that you will be joining to the table. This means that it evaluates user_id = 1
first, and takes the subset of user_category_subscriptions
with a user_id
of 1
to join to all of the rows in categories
. This will give you all of the rows in categories
, while only the categories
that this particular user has subscribed to will have any information in the user_category_subscriptions
columns. Of course, all other categories
will be populated with null
in the user_category_subscriptions
columns.
Conversely, a where
clause does the join, and then reduces the rowset. So, this does all of the joins and then eliminates all rows where user_id
doesn't equal 1
. You're left with an inefficient way to get an inner join
.
Hopefully this helps!
SQL JOIN - WHERE clause vs. ON clause
They are not the same thing.
Consider these queries:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345
and
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345
The first will return an order and its lines, if any, for order number 12345
. The second will return all orders, but only order 12345
will have any lines associated with it.
With an INNER JOIN
, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.
MySQL: JOIN vs WHERE performance
With a minor bit of investigation.
I have sympathy with the point above that readability is important, although I find the join to be readable while sub queries I find less readable (although in this case the sub query is quite simple so not a major issue either way).
Normally I would hope that MySQL would manage to optimise a non correlated sub query away and execute it just as efficiently as if it were a join. This sub query at first glance does appear to be non correlated (ie, the results of it do not depend on the containing query).
However playing on SQL fiddle this doesn't appear to be the case:-
http://www.sqlfiddle.com/#!2/7696c/2
Using the sub query the explain says it is an UNCACHEABLE SUBQUERY which from the manual is :-
A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query
Doing much the same sub query by specifying the value rather than passing it in as a variable gives a different explain and just describes it as a SUBQUERY . This I suspect is just as efficient as the join.
My feeling is that MySQL is confused by the use of the variable, and has planned the query on the assumption that the value of the variable can change between rows. Hence it needs to re execute the sub query for every row. It hasn't managed to recognise that there is nothing in the query that modifies the value of the variable.
If you want to try yourself here are the details to set up the test:-
CREATE TABLE `table`
(
id INT,
PRIMARY KEY id(id)
);
CREATE TABLE another_table
(
id INT,
table_id_fk INT,
PRIMARY KEY id (id),
INDEX table_id_fk (table_id_fk)
);
INSERT INTO `table`
VALUES
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8);
INSERT INTO another_table
VALUES
(11,1),
(12,3),
(13,5),
(14,7),
(15,9),
(16,11),
(17,13),
(18,15);
SQL to execute:-
SET @id:=13;
SELECT t.id
FROM `table` t
WHERE id = (
SELECT table_id_fk
FROM another_table
WHERE id = @id
);
SELECT t.id
FROM `table` t
JOIN another_table at
ON t.id = at.table_id_fk
WHERE at.id = @id;
SELECT t.id
FROM `table` t
WHERE id = (
SELECT table_id_fk
FROM another_table
WHERE id = 13
);
EXPLAIN SELECT t.id
FROM `table` t
WHERE id = (
SELECT table_id_fk
FROM another_table
WHERE id = @id
);
EXPLAIN SELECT t.id
FROM `table` t
JOIN another_table at
ON t.id = at.table_id_fk
WHERE at.id = @id;
EXPLAIN SELECT t.id
FROM `table` t
WHERE id = (
SELECT table_id_fk
FROM another_table
WHERE id = 13
);
Explain results:-
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 PRIMARY t index (null) PRIMARY 4 (null) 8 Using where; Using index
2 UNCACHEABLE SUBQUERY another_table const PRIMARY PRIMARY 4 const 1
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 SIMPLE at const PRIMARY,table_id_fk PRIMARY 4 const 1
1 SIMPLE t const PRIMARY PRIMARY 4 const 1 Using index
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 PRIMARY t const PRIMARY PRIMARY 4 const 1 Using index
2 SUBQUERY another_table const PRIMARY PRIMARY 4 1
When should I prefer JOIN over WHERE in MySQL queries?
Explicitly mentioning the join is generally supposed to be better (and easier to read) besides being the ANSI standard, but with modern optimizers, I dont think there is any marked difference in performance in both the versions.
Note: the two queries you mentioned are not equivalent - if you replace the left join with an inner join, they become equivalent, in which case there is no noticeable difference in performance.
An inner join is generally faster than a left join.
INNER JOIN ON vs WHERE clause
INNER JOIN
is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN
whenever a need arises.
The WHERE
syntax is more relational model oriented.
A result of two tables JOIN
ed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE
syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN
clause.
Using this clause, you can control the JOIN
order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE
syntax.
Join vs. sub-query
Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.
So subqueries can be slower than LEFT [OUTER] JOIN
, but in my opinion their strength is slightly higher readability.
In SQL / MySQL, what is the difference between ON and WHERE in a join statement?
WHERE
is a part of the SELECT
query as a whole, ON
is a part of each individual join.
ON
can only refer to the fields of previously used tables.
When there is no actual match against a record in the left table, LEFT JOIN
returns one record from the right table with all fields set to NULLS
. WHERE
clause then evaluates and filter this.
In your query, only the records from gifts
without match in 'sentgifts' are returned.
Here's the example
gifts
1 Teddy bear
2 Flowers
sentgifts
1 Alice
1 Bob
---
SELECT *
FROM gifts g
LEFT JOIN
sentgifts sg
ON g.giftID = sg.giftID
---
1 Teddy bear 1 Alice
1 Teddy bear 1 Bob
2 Flowers NULL NULL -- no match in sentgifts
---
SELECT *
FROM gifts g
LEFT JOIN
sentgifts sg
ON g.giftID = sg.giftID
WHERE sg.giftID IS NULL
---
2 Flowers NULL NULL -- no match in sentgifts
As you can see, no actual match can leave a NULL
in sentgifts.id
, so only the gifts that had not ever been sent are returned.
Related Topics
Quick Selection of a Random Row from a Large Table in MySQL
Table Naming Dilemma: Singular Vs. Plural Names
How to Combine Multiple Rows into a Comma-Delimited List in Oracle
How to Create a Calendar Table For 100 Years in Sql
How to Select the Nth Row in a SQL Database Table
Computed/Calculated/Virtual/Derived Columns in Postgresql
SQL Statement Is Ignoring Where Parameter
Why Does Null = Null Evaluate to False in SQL Server
Entity Attribute Value Database Vs. Strict Relational Model Ecommerce
Get Top Results For Each Group (In Oracle)
Oracle: How to Upsert (Update or Insert into a Table)