EXISTS vs JOIN and use of EXISTS clause
EXISTS
is used to return a boolean value, JOIN
returns a whole other table
EXISTS
is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN
is used to extend a result set by combining it with additional fields from another table to which there is a relation.
In your example, the queries are semantically equivalent.
In general, use EXISTS
when:
- You don't need to return data from the related table
- You have dupes in the related table (
JOIN
can cause duplicate rows if values are repeated) - You want to check existence (use instead of
LEFT OUTER JOIN...NULL
condition)
If you have proper indexes, most of the time the EXISTS
will perform identically to the JOIN
. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS
.
If your JOIN
key is not indexed, it may be quicker to use EXISTS
but you will need to test for your specific circumstance.
JOIN
syntax is easier to read and clearer normally as well.
Fastest way to determine if record exists
SELECT TOP 1 products.id FROM products WHERE products.id = ?;
will outperform all of your suggestions as it will terminate execution after it finds the first record.
Most efficient way to SELECT rows WHERE the ID EXISTS IN a second table
Summary:
IN and EXISTS performed similarly in all scenarios.. Below are the parameters used to validate..
Execution cost,Time:
Same for both and optimizer produced same plan.
Memory Grant:
Same for both queries
Cpu Time,Logical reads :
Exists seems to outperform IN by little bit margin in terms of CPU Time,though reads are same..
I ran each query 10 times each using below test data set..
- A very large subquery result set (100000 rows)
- Duplicate rows
- Null rows
For all the above scenarios, both IN
and EXISTS
performed in identical manner.
Some info about Performance V3 database used for testing.
20000 customers having 1000000 orders, so each customer is randomly duplicated (in a range of 10 to 100) in the orders table.
Execution cost,Time:
Below is screenshot of both queries running. Observe each query relative cost.
Memory Cost:
Memory grant for the two queries is also same..I Forced MDOP 1 so as not to spill them to TEMPDB..
CPU Time ,Reads:
For Exists:
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 469 ms, elapsed time = 595 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
For IN:
(20000 row(s) affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 547 ms, elapsed time = 669 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
In each case, the optimizer is smart enough to rearrange the queries.
I tend to use EXISTS
only though (my opinion). One use case to use EXISTS
is when you don't want to return a second table result set.
Update as per queries from Martin Smith:
I ran the below queries to find the most effective way to get rows from the first table for which a reference exists in the second table.
SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid
SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid
SELECT *
FROM Customers C
WHERE EXISTS(SELECT 1 FROM Orders o WHERE o.custid = c.custid)
SELECT *
FROM Customers c
WHERE custid IN (SELECT custid FROM Orders)
All the above queries share the same cost with the exception of 2nd INNER JOIN
, Plan being the same for the rest.
Memory Grant:
This query
SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid
required memory grant of
This query
SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid
required memory grant of ..
CPU Time,Reads:
For Query :
SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid
(20000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 48, logical reads 1344, physical reads 96, read-ahead reads 1248, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1453 ms, elapsed time = 781 ms.
For Query:
SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid
(20000 row(s) affected)
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1499 ms, elapsed time = 403 ms.
Joining a list of values with table rows in SQL
If in Microsoft SQL Server 2008 or later, then you can use Table Value Constructor
Select v.valueId, m.name
From (values (1), (2), (3), (4), (5)) v(valueId)
left Join otherTable m
on m.id = v.valueId
Postgres also has this construction VALUES Lists:
SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter)
Also note the possible Common Table Expression syntax which can be handy to make joins:
WITH my_values(num, str) AS (
VALUES (1, 'one'), (2, 'two'), (3, 'three')
)
SELECT num, txt FROM my_values
With Oracle it's possible, though heavier From ASK TOM:
with id_list as (
select 10 id from dual union all
select 20 id from dual union all
select 25 id from dual union all
select 70 id from dual union all
select 90 id from dual
)
select * from id_list;
Using a left join and checking if the row existed along with another check in where clause
According to this answer, in SQL-Server using NOT EXISTS
is more efficient than LEFT JOIN/IS NULL
SELECT *
FROM Users u
WHERE u.IsActive = 1
AND u.Status <> 'disabled'
AND NOT EXISTS (SELECT 1 FROM Banned b WHERE b.UserID = u.UserID)
EDIT
For the sake of completeness this is how I would do it with a LEFT JOIN
:
SELECT *
FROM Users u
LEFT JOIN Banned b
ON b.UserID = u.UserID
WHERE u.IsActive = 1
AND u.Status <> 'disabled'
AND b.UserID IS NULL -- EXCLUDE ROWS WITH A MATCH IN `BANNED`
How to select rows with no matching entry in another table?
Here's a simple query:
SELECT t1.ID
FROM Table1 t1
LEFT JOIN Table2 t2 ON t1.ID = t2.ID
WHERE t2.ID IS NULL
The key points are:
LEFT JOIN
is used; this will return ALL rows fromTable1
, regardless of whether or not there is a matching row inTable2
.The
WHERE t2.ID IS NULL
clause; this will restrict the results returned to only those rows where the ID returned fromTable2
is null - in other words there is NO record inTable2
for that particular ID fromTable1
.Table2.ID
will be returned as NULL for all records fromTable1
where the ID is not matched inTable2
.
Find records where join doesn't exist
Use an EXISTS
expression:
WHERE NOT EXISTS (
SELECT FROM votes v -- SELECT list can be empty
WHERE v.some_id = base_table.some_id
AND v.user_id = ?
)
The difference
... between NOT EXISTS()
(Ⓔ) and NOT IN()
(Ⓘ) is twofold:
Performance
Ⓔ is generally faster. It stops processing the subquery as soon as the first match is found. The manual:
The subquery will generally only be executed long enough to determine
whether at least one row is returned, not all the way to completion.Ⓘ can also be optimized by the query planner, but to a lesser extent since
NULL
handling makes it more complex.Correctness
If one of the resulting values in the subquery expression is
NULL
, the result of Ⓘ isNULL
, while common logic would expectTRUE
- and Ⓔ will returnTRUE
. The manual:If all the per-row results are either unequal or null, with at least
one null, then the result ofNOT IN
is null.
Essentially, (NOT) EXISTS
is the better choice in most cases.
Example
Your query can look like this:
SELECT *
FROM questions q
WHERE NOT EXISTS (
SELECT FROM votes v
WHERE v.question_id = q.id
AND v.user_id = ?
);
Do not join to votes
in the base query. That would void the effort.
Besides NOT EXISTS
and NOT IN
there are additional syntax options with LEFT JOIN / IS NULL
and EXCEPT
. See:
- Select rows which are not present in other table
Find records from one table which don't exist in another
There's several different ways of doing this, with varying efficiency, depending on how good your query optimiser is, and the relative size of your two tables:
This is the shortest statement, and may be quickest if your phone book is very short:
SELECT *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)
alternatively (thanks to Alterlife)
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
or (thanks to WOPR)
SELECT *
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
(ignoring that, as others have said, it's normally best to select just the columns you want, not '*
')
How to select all records from one table that do not exist in another table?
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Q: What is happening here?
A: Conceptually, we select all rows from table1
and for each row we attempt to find a row in table2
with the same value for the name
column. If there is no such row, we just leave the table2
portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name
column (the one we are sure that exists, from table1
).
While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL
Related Topics
Pros/Cons of Storing Serialized Hash VS. Key/Value Database Object in Activerecord
How to Detect If a String Contains Special Characters
Rotate/Pivot Table with Aggregation in Oracle
Is Activerecord's "Order" Method Vulnerable to SQL Injection
When to Open and Close Brackets Surrounding Joins in Ms Access SQL
SQL in Query Produces Strange Result
How to Drop a Foreign Key Constraint Only If It Exists in SQL Server
What Is the Purpose of Order by 1 in SQL Select Statement
When Should I Use Primary Key or Index
View or Temporary Table - Which to Use in Ms SQL Server
How to Get First and Last Record from a SQL Query
Update Multiple Rows with Different Values in a Single SQL Query
Show Create Table' Equivalent in Oracle SQL