SQL performance on LEFT OUTER JOIN vs NOT EXISTS
Joe's link is a good starting point. Quassnoi covers this too.
In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST
in the subquery) NOT EXISTS
will perform better.
EXISTS
and NOT EXISTS
both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.
LEFT JOIN
will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN
criteria, this can be very very resource intensive.
I normally try to use NOT EXISTS
and EXISTS
where possible. For SQL Server, IN
and NOT IN
are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.
Performance difference between NOT Exists and LEFT JOIN IN SQL Server
Go for NOT EXISTS
generally.
It is more efficient than NOT IN
if the columns on either side are nullable (and has the semantics you probably desire anyway)
Left join ... Null sometimes does the whole join with a later filter to preserve the rows matching the is null
and can be much less efficient.
An example demonstrating this is below. Notice the extra operators in the NOT IN
plan and how the outer join plan blows up to create a join of over 1 million rows going into the filter.
Not Exists
Outer Join ... NULL
Not In
CREATE TABLE Table1 (
IdColumn INT IDENTITY PRIMARY KEY,
Column1 INT NULL,
Filler CHAR(8000) NULL,
UNIQUE(Column1, IdColumn) );
CREATE TABLE Table2 (
IdColumn INT IDENTITY PRIMARY KEY,
Column2 INT NULL,
Filler CHAR(8000) NULL,
UNIQUE(Column2, IdColumn) );
INSERT INTO Table2 (Column2)
OUTPUT INSERTED.Column2
INTO Table1(Column1)
SELECT number % 5
FROM master..spt_values
SELECT *
FROM Table1 t1
WHERE NOT EXISTS (SELECT *
FROM Table2 t2
WHERE t2.Column2 = t1.Column1)
SELECT *
FROM Table1
WHERE Column1 NOT IN (SELECT Column2
FROM Table2)
SELECT Table1.*
FROM Table1
LEFT JOIN Table2
ON Table1.Column1 = Table2.Column2
WHERE Table2.IdColumn IS NULL
DROP TABLE Table1, Table2
SQL Server Performance: LEFT JOIN vs NOT IN
Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.
But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
In a nutshell:
NOT IN
is a little bit different: it never matches if there is but a single NULL
in the list.
In
MySQL
,NOT EXISTS
is a little bit less efficientIn
SQL Server
,LEFT JOIN / IS NULL
is less efficientIn
PostgreSQL
,NOT IN
is less efficientIn
Oracle
, all three methods are the same.
Are the SQL concepts LEFT OUTER JOIN and WHERE NOT EXISTS basically the same?
No they are not the same thing, as they will not return the same rowset in the most simplistic use case.
The LEFT OUTER JOIN
will return all rows from the left table, both where rows exist in the related table and where they does not. The WHERE NOT EXISTS()
subquery will only return rows where the relationship is not met.
However, if you did a LEFT OUTER JOIN
and looked for IS NULL
on the foreign key column in the WHERE
clause, you can make equivalent behavior to the WHERE NOT EXISTS
.
For example this:
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* IS NULL in the WHERE clause */
WHERE t_related.id IS NULL
Is equivalent to this:
SELECT
t_main.*
FROM t_main
WHERE
NOT EXISTS (
SELECT t_related.id
FROM t_related
WHERE t_main.id = t_related.id
)
But this one is not equivalent:
It will return rows from t_main
both having and not having related rows in t_related
.
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* WHERE clause does not exclude NULL foreign keys */
Note This does not speak to how the queries are compiled and executed, which differs as well -- this only addresses a comparison of the rowsets they return.
Where not exists vs left outer join in oracle sql
You can use this:
select Table1.*
from (select * from SomeTable) Table1
left outer join SomeOtherTable sot
on Table1.columnB = sot.columnB
where sot.columnB is null;
For the performance it is important to have indexes on columnB
on both tables.
JOIN versus EXISTS performance
NOT EXISTS is more efficient than using a LEFT OUTER JOIN to exclude records that are missing from the participating table using an IS NULL condition because the optimizer will elect to use an EXCLUSION MERGE JOIN with the NOT EXISTS predicate.
While your second test did not yield impressive results for the data sets you were using the performance increase from NOT EXISTS over a LEFT JOIN is very noticeable as your data volumes increase. Keep in mind that the tables will need to be hash distributed by the columns that participate in the NOT EXISTS join just like they would in the LEFT JOIN. Therefore, data skew can impact the performance of the EXCLUSION MERGE JOIN.
EDIT:
Typically, I would defer to EXISTS as a replacement for IN instead of using it for re-writing a join solution. This is especially true when the column(s) participating in the logical comparison can be NULL. That's not to say you couldn't use EXISTS in place of an INNER JOIN. Instead of an EXCLUSION JOIN you will end up with an INCLUSION JOIN. The INNER JOIN is in essence an inclusion join to begin with. I'm sure there are some nuances that I am overlooking but you can find those in the manuals if you wish to take the time to read them.
Query performance - 'Left join is null' vs 'Not exists select'
It seems to be a close race between the two formulations. (Some other example may show a clearer winner.)
From the HANDLER values: Query 1 did more read_keys, and some writing (which goes along with MATERIALIZED). The other numbers were about same. So, I conclude that Query 1 is slower -- although possibly not enough slower to make much difference.
I vote for LEFT JOIN as the better query pattern (in this case)
Related Topics
Query With Left Join Not Returning Rows For Count of 0
Table-Less Union Query in Ms Access (Jet/Ace)
Check Constraint in MySQL Is Not Working
SQL Server - Transactions Roll Back on Error
Delete Column from Sqlite Table
Getting Result of Dynamic SQL into a Variable For Sql-Server
Can Table Columns With a Foreign Key Be Null
SQL Server Indexes - Ascending or Descending, What Difference Does It Make
Calculating Difference Between Two Timestamps in Oracle in Milliseconds
SQL Standard to Escape Column Names
Return Multiple Columns of the Same Row as Json Array of Objects
How to Create a Sequence in MySQL
SQL to Json - Array of Objects to Array of Values in SQL 2016
How to Store Only Time; Not Date and Time
How to Count Unique Items in Field in Access Query