Can an INNER JOIN offer better performance than EXISTS
Generally speaking, INNER JOIN
and EXISTS
are different things.
The former returns duplicates and columns from both tables, the latter returns one record and, being a predicate, returns records from only one table.
If you do an inner join on a UNIQUE
column, they exhibit same performance.
If you do an inner join on a recordset with DISTINCT
applied (to get rid of the duplicates), EXISTS
is usually faster.
IN
and EXISTS
clauses (with an equijoin correlation) usually employ one of the several SEMI JOIN
algorithms which are usually more efficient than a DISTINCT
on one of the tables.
See this article in my blog:
- IN vs. JOIN vs. EXISTS
Inner Join versus Exists() while avoiding duplicate rows
I would say go with the EXISTS approach since you are not really needing anything from the available_employees other than checking for the existence.
Having said that it depends on your data as well and how your database query optimizer optmizes it. I would suggest you to see the query plan for each approach and see which one is less expensive.
Check these links as well http://dotnetvj.blogspot.com/2009/07/why-we-should-use-exists-instead-of.html Can an INNER JOIN offer better performance than EXISTS
EXISTS vs JOIN and use of EXISTS clause
EXISTS
is used to return a boolean value, JOIN
returns a whole other table
EXISTS
is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN
is used to extend a result set by combining it with additional fields from another table to which there is a relation.
In your example, the queries are semantically equivalent.
In general, use EXISTS
when:
- You don't need to return data from the related table
- You have dupes in the related table (
JOIN
can cause duplicate rows if values are repeated) - You want to check existence (use instead of
LEFT OUTER JOIN...NULL
condition)
If you have proper indexes, most of the time the EXISTS
will perform identically to the JOIN
. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS
.
If your JOIN
key is not indexed, it may be quicker to use EXISTS
but you will need to test for your specific circumstance.
JOIN
syntax is easier to read and clearer normally as well.
INNER JOIN vs LEFT JOIN performance in SQL Server
A LEFT JOIN
is absolutely not faster than an INNER JOIN
. In fact, it's slower; by definition, an outer join (LEFT JOIN
or RIGHT JOIN
) has to do all the work of an INNER JOIN
plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.
(And even if a LEFT JOIN
were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN
, so you cannot simply go replacing all instances of one with the other!)
Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.
Edit:
Reflecting further on this, I could think of one circumstance under which a LEFT JOIN
might be faster than an INNER JOIN
, and that is when:
- Some of the tables are very small (say, under 10 rows);
- The tables do not have sufficient indexes to cover the query.
Consider this example:
CREATE TABLE #Test1
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')
CREATE TABLE #Test2
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')
SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name
SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name
DROP TABLE #Test1
DROP TABLE #Test2
If you run this and view the execution plan, you'll see that the INNER JOIN
query does indeed cost more than the LEFT JOIN
, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN
, but does nested loops for the LEFT JOIN
; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query.
You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.
But change this query to be on the ID
column instead of Name
and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN
version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows.
So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN
than a LEFT JOIN
.
Join vs. sub-query
Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.
So subqueries can be slower than LEFT [OUTER] JOIN
, but in my opinion their strength is slightly higher readability.
SQL Server IN vs. EXISTS Performance
EXISTS
will be faster because once the engine has found a hit, it will quit looking as the condition has proved true.
With IN
, it will collect all the results from the sub-query before further processing.
SQL performance on LEFT OUTER JOIN vs NOT EXISTS
Joe's link is a good starting point. Quassnoi covers this too.
In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST
in the subquery) NOT EXISTS
will perform better.
EXISTS
and NOT EXISTS
both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.
LEFT JOIN
will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN
criteria, this can be very very resource intensive.
I normally try to use NOT EXISTS
and EXISTS
where possible. For SQL Server, IN
and NOT IN
are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.
Is a JOIN faster than a WHERE?
Theoretically, no, it shouldn't be any faster. The query optimizer should be able to generate an identical execution plan. However, some database engines can produce better execution plans for one of them (not likely to happen for such a simple query but for complex enough ones). You should test both and see (on your database engine).
Related Topics
Access - Compare Two Tables and Update or Insert Data in First Table
Any Way to Achieve Fulltext-Like Search on Innodb
Calculate Hours Based on Business Hours in Oracle SQL
SQL Create Logon - Can't Use @Parameter as Username
Aggregate a Single Column in Query with Many Columns
Postgresql Column Not Found, But Shows in Describe
Update with Join Query in Oracle
What Is the Equivalent of 'Describe Table' in SQL Server
When to Denormalize a Database Design
Display Names of All Constraints for a Table in Oracle SQL
Why Is Iterating Through a Large Django Queryset Consuming Massive Amounts of Memory
How to Alter a Postgresql Table and Make a Column Unique
Execute Dynamic Query with Go in SQL
Ora 00904 Error:Invalid Identifier
Postgresql Order by Issue - Natural Sort
SQL Error: Ora-00933: SQL Command Not Properly Ended
Why Does Using an Underscore Character in a Like Filter Give Me All the Results