SQL Performance on Left Outer Join VS Not Exists

SQL performance on LEFT OUTER JOIN vs NOT EXISTS

Joe's link is a good starting point. Quassnoi covers this too.

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

Performance difference between NOT Exists and LEFT JOIN IN SQL Server

Go for NOT EXISTS generally.

It is more efficient than NOT IN if the columns on either side are nullable (and has the semantics you probably desire anyway)

Left join ... Null sometimes does the whole join with a later filter to preserve the rows matching the is null and can be much less efficient.

An example demonstrating this is below. Notice the extra operators in the NOT IN plan and how the outer join plan blows up to create a join of over 1 million rows going into the filter.

Not Exists

Sample Image

Outer Join ... NULL

Sample Image

Not In

Sample Image

CREATE TABLE Table1 (
     IdColumn INT IDENTITY PRIMARY KEY,
     Column1  INT NULL,
     Filler   CHAR(8000) NULL,
     UNIQUE(Column1, IdColumn) );

CREATE TABLE Table2 (
     IdColumn INT IDENTITY PRIMARY KEY,
     Column2  INT NULL,
     Filler   CHAR(8000) NULL,
     UNIQUE(Column2, IdColumn) );

INSERT INTO Table2 (Column2)
OUTPUT      INSERTED.Column2
INTO Table1(Column1)
SELECT number % 5
FROM   master..spt_values

SELECT *
FROM   Table1 t1
WHERE  NOT EXISTS (SELECT *
                   FROM   Table2 t2
                   WHERE  t2.Column2 = t1.Column1)

SELECT *
FROM   Table1
WHERE  Column1 NOT IN (SELECT Column2
                       FROM   Table2)

SELECT Table1.*
FROM   Table1
       LEFT JOIN Table2
         ON Table1.Column1 = Table2.Column2
WHERE  Table2.IdColumn IS NULL

DROP TABLE Table1, Table2

SQL Server Performance: LEFT JOIN vs NOT IN

Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.

But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.

What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

In a nutshell:

NOT IN is a little bit different: it never matches if there is but a single NULL in the list.

In MySQL, NOT EXISTS is a little bit less efficient
In SQL Server, LEFT JOIN / IS NULL is less efficient
In PostgreSQL, NOT IN is less efficient
In Oracle, all three methods are the same.

Are the SQL concepts LEFT OUTER JOIN and WHERE NOT EXISTS basically the same?

No they are not the same thing, as they will not return the same rowset in the most simplistic use case.

The LEFT OUTER JOIN will return all rows from the left table, both where rows exist in the related table and where they does not. The WHERE NOT EXISTS() subquery will only return rows where the relationship is not met.

However, if you did a LEFT OUTER JOIN and looked for IS NULL on the foreign key column in the WHERE clause, you can make equivalent behavior to the WHERE NOT EXISTS.

For example this:

SELECT 
  t_main.*
FROM 
   t_main
   LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* IS NULL in the WHERE clause */
WHERE t_related.id IS NULL

Is equivalent to this:

SELECT
  t_main.*
FROM t_main 
WHERE 
  NOT EXISTS (
    SELECT t_related.id 
    FROM t_related 
    WHERE t_main.id = t_related.id
  )

But this one is not equivalent:

It will return rows from t_main both having and not having related rows in t_related.

SELECT 
  t_main.*
FROM
  t_main
  LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* WHERE clause does not exclude NULL foreign keys */

Note This does not speak to how the queries are compiled and executed, which differs as well -- this only addresses a comparison of the rowsets they return.

Where not exists vs left outer join in oracle sql

You can use this:

select Table1.*
from (select * from SomeTable) Table1
left outer join SomeOtherTable sot
    on Table1.columnB = sot.columnB
where sot.columnB is null;

For the performance it is important to have indexes on columnB on both tables.

JOIN versus EXISTS performance

NOT EXISTS is more efficient than using a LEFT OUTER JOIN to exclude records that are missing from the participating table using an IS NULL condition because the optimizer will elect to use an EXCLUSION MERGE JOIN with the NOT EXISTS predicate.

While your second test did not yield impressive results for the data sets you were using the performance increase from NOT EXISTS over a LEFT JOIN is very noticeable as your data volumes increase. Keep in mind that the tables will need to be hash distributed by the columns that participate in the NOT EXISTS join just like they would in the LEFT JOIN. Therefore, data skew can impact the performance of the EXCLUSION MERGE JOIN.

EDIT:

Typically, I would defer to EXISTS as a replacement for IN instead of using it for re-writing a join solution. This is especially true when the column(s) participating in the logical comparison can be NULL. That's not to say you couldn't use EXISTS in place of an INNER JOIN. Instead of an EXCLUSION JOIN you will end up with an INCLUSION JOIN. The INNER JOIN is in essence an inclusion join to begin with. I'm sure there are some nuances that I am overlooking but you can find those in the manuals if you wish to take the time to read them.

Query performance - 'Left join is null' vs 'Not exists select'

It seems to be a close race between the two formulations. (Some other example may show a clearer winner.)

From the HANDLER values: Query 1 did more read_keys, and some writing (which goes along with MATERIALIZED). The other numbers were about same. So, I conclude that Query 1 is slower -- although possibly not enough slower to make much difference.

I vote for LEFT JOIN as the better query pattern (in this case)

SQL Performance on Left Outer Join VS Not Exists