NOT IN vs NOT EXISTS
I always default to NOT EXISTS
.
The execution plans may be the same at the moment but if either column is altered in the future to allow NULL
s the NOT IN
version will need to do more work (even if no NULL
s are actually present in the data) and the semantics of NOT IN
if NULL
s are present are unlikely to be the ones you want anyway.
When neither Products.ProductID
or [Order Details].ProductID
allow NULL
s the NOT IN
will be treated identically to the following query.
SELECT ProductID,
ProductName
FROM Products p
WHERE NOT EXISTS (SELECT *
FROM [Order Details] od
WHERE p.ProductId = od.ProductId)
The exact plan may vary but for my example data I get the following.
A reasonably common misconception seems to be that correlated sub queries are always "bad" compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.
/*Not valid syntax but better reflects the plan*/
SELECT p.ProductID,
p.ProductName
FROM Products p
LEFT ANTI SEMI JOIN [Order Details] od
ON p.ProductId = od.ProductId
If [Order Details].ProductID
is NULL
-able the query then becomes
SELECT ProductID,
ProductName
FROM Products p
WHERE NOT EXISTS (SELECT *
FROM [Order Details] od
WHERE p.ProductId = od.ProductId)
AND NOT EXISTS (SELECT *
FROM [Order Details]
WHERE ProductId IS NULL)
The reason for this is that the correct semantics if [Order Details]
contains any NULL
ProductId
s is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.
If Products.ProductID
is also changed to become NULL
-able the query then becomes
SELECT ProductID,
ProductName
FROM Products p
WHERE NOT EXISTS (SELECT *
FROM [Order Details] od
WHERE p.ProductId = od.ProductId)
AND NOT EXISTS (SELECT *
FROM [Order Details]
WHERE ProductId IS NULL)
AND NOT EXISTS (SELECT *
FROM (SELECT TOP 1 *
FROM [Order Details]) S
WHERE p.ProductID IS NULL)
The reason for that one is because a NULL
Products.ProductId
should not be returned in the results except if the NOT IN
sub query were to return no results at all (i.e. the [Order Details]
table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.
The effect of this is shown in the blog post already linked by Buckley. In the example there the number of logical reads increase from around 400 to 500,000.
Additionally the fact that a single NULL
can reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no NULL
rows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, with inappropriate nested loops causing repeated execution of an expensive sub tree for example.
This is not the only possible execution plan for a NOT IN
on a NULL
-able column however. This article shows another one for a query against the AdventureWorks2008
database.
For the NOT IN
on a NOT NULL
column or the NOT EXISTS
against either a nullable or non nullable column it gives the following plan.
When the column changes to NULL
-able the NOT IN
plan now looks like
It adds an extra inner join operator to the plan. This apparatus is explained here. It is all there to convert the previous single correlated index seek on Sales.SalesOrderDetail.ProductID = <correlated_product_id>
to two seeks per outer row. The additional one is on WHERE Sales.SalesOrderDetail.ProductID IS NULL
.
As this is under an anti semi join if that one returns any rows the second seek will not occur. However if Sales.SalesOrderDetail
does not contain any NULL
ProductID
s it will double the number of seek operations required.
Converting From NOT EXISTS to NOT IN
Inorder to get a list of sailors who have reserved every boat. I'll use this script
Solution 1:
;WITH k AS
(
SELECT b.sname,COUNT(distinct a.bname) coun FROM boat a
INNER JOIN reservation b
on a.bname = b.bname
GROUP BY b.sname
)
SELECT k.sname FROM k WHERE coun = (select COUNT(*) FROM boat AS b)
Solution 2:
SELECT s.sname
FROM sailor AS s
WHERE s.sname NOT IN (SELECT DISTINCT a.sname
FROM (SELECT s.sname,
b.bname
FROM sailor AS s
CROSS JOIN boat AS b
WHERE b.color = "Red") a
WHERE a.sname + a.bname
NOT IN (SELECT r.sname + r.bname
FROM reservation AS r
WHERE r.sname IS NOT NULL
AND r.bname IS NOT NULL));
How can I convert a NOT IN statement to a NOT EXISTS statement in SQL?
You have heard the rule incompletely. This is false:
I heard that NOT IN should be avoided at all costs, . . .
This is much closer to being true:
I heard that NOT IN with a subquery should be avoided at all costs, . . .
There are two reasons for this. By far the more important has to do with the handling of NULL
values. If any value returned by the subquery is NULL
, then NOT IN
never returns TRUE
. That is, the query returns no rows (if this is the only condition).
On the other hand, NOT EXISTS
does what you expect in this case, essentially ignoring NULL
values in the subquery.
This is not an issue with explicit lists, because it is unlikely that you will include a NULL
value in an explicit list.
The second issue is performance. Some databases will optimize NOT EXISTS
with a subquery much better than NOT IN
-- particularly if the appropriate indexes are available.
Convert not in to not exists
SELECT id,
NAME,
cat
FROM posts p
WHERE NOT EXISTS (SELECT 1
FROM (SELECT 1 AS col
FROM dual
UNION
SELECT 100 AS col
FROM dual) a
WHERE p.id = a.col);
Your current query is good.But still if you want it using NOT EXISTS give this a try.
SQL, how do I convert to logic ('not in' to 'not exists')
Not in and not exists do not always have the same meaning. I assume you want to convert because "not in" tends to be slow. Here is another way where the logic will always match.
Delete From StudentTb
Where StudentType in (1, 2)
and StudentI in
(
select StudentI
from StudentTb
except
Select StudentI From StudentLog
)
Replacing NOT IN with NOT EXISTS and an OUTER JOIN in Oracle Database 12c
The query plan will tell you. It will depend on the data and tables. In the case of OUTER JOIN and NOT EXISTS they are the same.
However to your opening sentence, NOT IN and NOT EXISTS are not the same if NULL is accepted on model. In this case you say model cannot be null so you might find they all have the same plan anyway. However when making this assumption, the database must be told there cannot be nulls (using NOT NULL) as opposed to there simply not being any. If you don't it will make different plans for each query which may result in different performance depending on your actual data. This is generally true and particularly true for ORACLE which does not index NULLs.
Check out EXPLAIN PLAN
Related Topics
Sql Server Management Studio 2008 Scheduled Export to Ms Access
How to Select and Order by Columns Not in Groupy by SQL Statement - Oracle
Sql Server Table Locks in Long Query - Solution: Nolock
Hive Left Semi Join for 'Not Exists'
What Are Ways to Match Street Addresses in SQL Server
Nesting Aggregate Functions - Sql
Setting Identity to on or Off in SQL Server
Error While Uploading a Report
How to Use Sum for Bit Columns
Sql - Create Database and Tables in One Script
Sqlite: Alias Column Name Can't Contains a Dot "."
Django Annotate() Multiple Times Causes Wrong Answers