What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
In a nutshell:
NOT IN
is a little bit different: it never matches if there is but a single NULL
in the list.
In
MySQL
,NOT EXISTS
is a little bit less efficientIn
SQL Server
,LEFT JOIN / IS NULL
is less efficientIn
PostgreSQL
,NOT IN
is less efficientIn
Oracle
, all three methods are the same.
SQL Server Performance: LEFT JOIN vs NOT IN
Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.
But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.
Performance difference between NOT Exists and LEFT JOIN IN SQL Server
Go for NOT EXISTS
generally.
It is more efficient than NOT IN
if the columns on either side are nullable (and has the semantics you probably desire anyway)
Left join ... Null sometimes does the whole join with a later filter to preserve the rows matching the is null
and can be much less efficient.
An example demonstrating this is below. Notice the extra operators in the NOT IN
plan and how the outer join plan blows up to create a join of over 1 million rows going into the filter.
Not Exists
Outer Join ... NULL
Not In
CREATE TABLE Table1 (
IdColumn INT IDENTITY PRIMARY KEY,
Column1 INT NULL,
Filler CHAR(8000) NULL,
UNIQUE(Column1, IdColumn) );
CREATE TABLE Table2 (
IdColumn INT IDENTITY PRIMARY KEY,
Column2 INT NULL,
Filler CHAR(8000) NULL,
UNIQUE(Column2, IdColumn) );
INSERT INTO Table2 (Column2)
OUTPUT INSERTED.Column2
INTO Table1(Column1)
SELECT number % 5
FROM master..spt_values
SELECT *
FROM Table1 t1
WHERE NOT EXISTS (SELECT *
FROM Table2 t2
WHERE t2.Column2 = t1.Column1)
SELECT *
FROM Table1
WHERE Column1 NOT IN (SELECT Column2
FROM Table2)
SELECT Table1.*
FROM Table1
LEFT JOIN Table2
ON Table1.Column1 = Table2.Column2
WHERE Table2.IdColumn IS NULL
DROP TABLE Table1, Table2
NOT EXISTS Vs. Left Outer Join
You need to correlate the exists subquery to the outer query. Here is one way:
SELECT tl.pkLeadID, tl.fkMasterPersonID
FROM dbo.tblPhoneLead tl
WHERE NOT EXISTS (
SELECT 1
FROM dbo.tblInternetMasterPerson mp
WHERE mp.MasterPersonID = tl.fkMasterPersonID
);
The old IN vs. Exists vs. Left Join (Where ___ Is or Is Not Null); Performance
Assuming TSQL to mean SQL Server, have you seen this link regarding a comparison of NOT IN, NOT EXISTS, and LEFT JOIN IS NULL? In summary, as long as the columns being compared can not be NULL, NOT IN
and NOT EXISTS
are more efficient than LEFT JOIN/IS NULL
...
Something to keep in mind about the difference between IN and EXISTS - EXISTS is a boolean operator, and returns true on the first time the criteria is satisfied. Though you see a correlated subquery in syntax, EXISTS has performed better than IN...
Also, IN and EXISTS only check for the existence of the value comparison. This means there's no duplication of records like you find when JOINing...
It really depends, so if you're really out to find what performs best you'll have to test & compare what the query plans are doing...
SQL performance on LEFT OUTER JOIN vs NOT EXISTS
Joe's link is a good starting point. Quassnoi covers this too.
In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST
in the subquery) NOT EXISTS
will perform better.
EXISTS
and NOT EXISTS
both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.
LEFT JOIN
will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN
criteria, this can be very very resource intensive.
I normally try to use NOT EXISTS
and EXISTS
where possible. For SQL Server, IN
and NOT IN
are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.
Are the SQL concepts LEFT OUTER JOIN and WHERE NOT EXISTS basically the same?
No they are not the same thing, as they will not return the same rowset in the most simplistic use case.
The LEFT OUTER JOIN
will return all rows from the left table, both where rows exist in the related table and where they does not. The WHERE NOT EXISTS()
subquery will only return rows where the relationship is not met.
However, if you did a LEFT OUTER JOIN
and looked for IS NULL
on the foreign key column in the WHERE
clause, you can make equivalent behavior to the WHERE NOT EXISTS
.
For example this:
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* IS NULL in the WHERE clause */
WHERE t_related.id IS NULL
Is equivalent to this:
SELECT
t_main.*
FROM t_main
WHERE
NOT EXISTS (
SELECT t_related.id
FROM t_related
WHERE t_main.id = t_related.id
)
But this one is not equivalent:
It will return rows from t_main
both having and not having related rows in t_related
.
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* WHERE clause does not exclude NULL foreign keys */
Note This does not speak to how the queries are compiled and executed, which differs as well -- this only addresses a comparison of the rowsets they return.
Query performance - 'Left join is null' vs 'Not exists select'
It seems to be a close race between the two formulations. (Some other example may show a clearer winner.)
From the HANDLER values: Query 1 did more read_keys, and some writing (which goes along with MATERIALIZED). The other numbers were about same. So, I conclude that Query 1 is slower -- although possibly not enough slower to make much difference.
I vote for LEFT JOIN as the better query pattern (in this case)
Related Topics
What Exactly Do Quotation Marks Around the Table Name Do
Select Group of Rows That Match All Items in a List
How to Create Multiple One to One's
Calculate Age in MySQL (Innodb)
Truncate (Not Round) Decimal Places in SQL Server
How to Create a Temporary Table in an Oracle Database
Select Max Value of Each Group
Accessing an Sqlite Database in Swift
What's the Difference Between Inner Join, Left Join, Right Join and Full Join
Get Top N Records For Each Group of Grouped Results
How to Concatenate Multiple MySQL Rows into One Field