What's the Difference Between Not Exists Vs. Not in Vs. Left Join Where Is Null

What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

In a nutshell:

NOT IN is a little bit different: it never matches if there is but a single NULL in the list.

  • In MySQL, NOT EXISTS is a little bit less efficient

  • In SQL Server, LEFT JOIN / IS NULL is less efficient

  • In PostgreSQL, NOT IN is less efficient

  • In Oracle, all three methods are the same.

SQL Server Performance: LEFT JOIN vs NOT IN

Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.

But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.

Performance difference between NOT Exists and LEFT JOIN IN SQL Server

Go for NOT EXISTS generally.

It is more efficient than NOT IN if the columns on either side are nullable (and has the semantics you probably desire anyway)

Left join ... Null sometimes does the whole join with a later filter to preserve the rows matching the is null and can be much less efficient.

An example demonstrating this is below. Notice the extra operators in the NOT IN plan and how the outer join plan blows up to create a join of over 1 million rows going into the filter.

Not Exists

Sample Image

Outer Join ... NULL

Sample Image

Not In

Sample Image

CREATE TABLE Table1 (
IdColumn INT IDENTITY PRIMARY KEY,
Column1 INT NULL,
Filler CHAR(8000) NULL,
UNIQUE(Column1, IdColumn) );

CREATE TABLE Table2 (
IdColumn INT IDENTITY PRIMARY KEY,
Column2 INT NULL,
Filler CHAR(8000) NULL,
UNIQUE(Column2, IdColumn) );

INSERT INTO Table2 (Column2)
OUTPUT INSERTED.Column2
INTO Table1(Column1)
SELECT number % 5
FROM master..spt_values

SELECT *
FROM Table1 t1
WHERE NOT EXISTS (SELECT *
FROM Table2 t2
WHERE t2.Column2 = t1.Column1)

SELECT *
FROM Table1
WHERE Column1 NOT IN (SELECT Column2
FROM Table2)

SELECT Table1.*
FROM Table1
LEFT JOIN Table2
ON Table1.Column1 = Table2.Column2
WHERE Table2.IdColumn IS NULL

DROP TABLE Table1, Table2

NOT EXISTS Vs. Left Outer Join

You need to correlate the exists subquery to the outer query. Here is one way:

SELECT tl.pkLeadID, tl.fkMasterPersonID
FROM dbo.tblPhoneLead tl
WHERE NOT EXISTS (
SELECT 1
FROM dbo.tblInternetMasterPerson mp
WHERE mp.MasterPersonID = tl.fkMasterPersonID
);

The old IN vs. Exists vs. Left Join (Where ___ Is or Is Not Null); Performance

Assuming TSQL to mean SQL Server, have you seen this link regarding a comparison of NOT IN, NOT EXISTS, and LEFT JOIN IS NULL? In summary, as long as the columns being compared can not be NULL, NOT IN and NOT EXISTS are more efficient than LEFT JOIN/IS NULL...

Something to keep in mind about the difference between IN and EXISTS - EXISTS is a boolean operator, and returns true on the first time the criteria is satisfied. Though you see a correlated subquery in syntax, EXISTS has performed better than IN...

Also, IN and EXISTS only check for the existence of the value comparison. This means there's no duplication of records like you find when JOINing...

It really depends, so if you're really out to find what performs best you'll have to test & compare what the query plans are doing...

SQL performance on LEFT OUTER JOIN vs NOT EXISTS

Joe's link is a good starting point. Quassnoi covers this too.

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

Are the SQL concepts LEFT OUTER JOIN and WHERE NOT EXISTS basically the same?

No they are not the same thing, as they will not return the same rowset in the most simplistic use case.

The LEFT OUTER JOIN will return all rows from the left table, both where rows exist in the related table and where they does not. The WHERE NOT EXISTS() subquery will only return rows where the relationship is not met.

However, if you did a LEFT OUTER JOIN and looked for IS NULL on the foreign key column in the WHERE clause, you can make equivalent behavior to the WHERE NOT EXISTS.

For example this:

SELECT 
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* IS NULL in the WHERE clause */
WHERE t_related.id IS NULL

Is equivalent to this:

SELECT
t_main.*
FROM t_main
WHERE
NOT EXISTS (
SELECT t_related.id
FROM t_related
WHERE t_main.id = t_related.id
)

But this one is not equivalent:

It will return rows from t_main both having and not having related rows in t_related.

SELECT 
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* WHERE clause does not exclude NULL foreign keys */

Note This does not speak to how the queries are compiled and executed, which differs as well -- this only addresses a comparison of the rowsets they return.

Query performance - 'Left join is null' vs 'Not exists select'

It seems to be a close race between the two formulations. (Some other example may show a clearer winner.)

From the HANDLER values: Query 1 did more read_keys, and some writing (which goes along with MATERIALIZED). The other numbers were about same. So, I conclude that Query 1 is slower -- although possibly not enough slower to make much difference.

I vote for LEFT JOIN as the better query pattern (in this case)



Related Topics



Leave a reply



Submit