Join Versus Exists Performance

JOIN versus EXISTS performance

NOT EXISTS is more efficient than using a LEFT OUTER JOIN to exclude records that are missing from the participating table using an IS NULL condition because the optimizer will elect to use an EXCLUSION MERGE JOIN with the NOT EXISTS predicate.

While your second test did not yield impressive results for the data sets you were using the performance increase from NOT EXISTS over a LEFT JOIN is very noticeable as your data volumes increase. Keep in mind that the tables will need to be hash distributed by the columns that participate in the NOT EXISTS join just like they would in the LEFT JOIN. Therefore, data skew can impact the performance of the EXCLUSION MERGE JOIN.

EDIT:

Typically, I would defer to EXISTS as a replacement for IN instead of using it for re-writing a join solution. This is especially true when the column(s) participating in the logical comparison can be NULL. That's not to say you couldn't use EXISTS in place of an INNER JOIN. Instead of an EXCLUSION JOIN you will end up with an INCLUSION JOIN. The INNER JOIN is in essence an inclusion join to begin with. I'm sure there are some nuances that I am overlooking but you can find those in the manuals if you wish to take the time to read them.

EXISTS vs JOIN and use of EXISTS clause

EXISTS is used to return a boolean value, JOIN returns a whole other table

EXISTS is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN is used to extend a result set by combining it with additional fields from another table to which there is a relation.

In your example, the queries are semantically equivalent.

In general, use EXISTS when:

  • You don't need to return data from the related table
  • You have dupes in the related table (JOIN can cause duplicate rows if values are repeated)
  • You want to check existence (use instead of LEFT OUTER JOIN...NULL condition)

If you have proper indexes, most of the time the EXISTS will perform identically to the JOIN. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS.

If your JOIN key is not indexed, it may be quicker to use EXISTS but you will need to test for your specific circumstance.

JOIN syntax is easier to read and clearer normally as well.

Can an INNER JOIN offer better performance than EXISTS

Generally speaking, INNER JOIN and EXISTS are different things.

The former returns duplicates and columns from both tables, the latter returns one record and, being a predicate, returns records from only one table.

If you do an inner join on a UNIQUE column, they exhibit same performance.

If you do an inner join on a recordset with DISTINCT applied (to get rid of the duplicates), EXISTS is usually faster.

IN and EXISTS clauses (with an equijoin correlation) usually employ one of the several SEMI JOIN algorithms which are usually more efficient than a DISTINCT on one of the tables.

See this article in my blog:

  • IN vs. JOIN vs. EXISTS

SQL JOIN vs IN performance?

Generally speaking, IN and JOIN are different queries that can yield different results.

SELECT  a.*
FROM a
JOIN b
ON a.col = b.col

is not the same as

SELECT  a.*
FROM a
WHERE col IN
(
SELECT col
FROM b
)

, unless b.col is unique.

However, this is the synonym for the first query:

SELECT  a.*
FROM a
JOIN (
SELECT DISTINCT col
FROM b
)
ON b.col = a.col

If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server.

If it's not, then IN is faster than JOIN on DISTINCT.

See this article in my blog for performance details:

  • IN vs. JOIN vs. EXISTS

SQL performance on LEFT OUTER JOIN vs NOT EXISTS

Joe's link is a good starting point. Quassnoi covers this too.

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

JOIN or Correlated subquery with exists clause, which one is better

Generally, the EXISTS clause because you may need DISTINCT for a JOIN for it to give the expected output. For example, if you have multiple Department rows for a ContactInformation row.

In your example above, the SELECT *:

  • means different output too so they are not actually equivalent
  • less chance of a index being used because you are pulling all columns out

Saying that, even with a limited column list, they will give the same plan: until you need DISTINCT... which is why I say "EXISTS"



Related Topics



Leave a reply



Submit