Is using “NOT EXISTS” considered to be bad SQL practise?
In MySQL
, Oracle
, SQL Server
and PostgreSQL
, NOT EXISTS
is of the same efficiency or even more efficient than LEFT JOIN / IS NULL
.
While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS
and even worse for NOT IN
, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join
methods.
In SQL Server
, actually, LEFT JOIN / IS NULL
may be less efficient than NOT EXISTS / NOT IN
in case of unindexed or low cardinality column in the inner table.
It is often heard that MySQL
is "especially bad in treating subqueries".
This roots from the fact that MySQL
is not capable of any join methods other than nested loops, which severely limits its optimization abilities.
The only case when a query would benefit from rewriting subquery as a join would be this:
SELECT *
FROM big_table
WHERE big_table_column IN
(
SELECT small_table_column
FROM small_table
)
small_table
will not be queried completely for each record in big_table
: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS
(using index_subquery
to search for the first much if needed if small_table_column
is indexed)
But big_table
would always be leading, which makes the query complete in big * LOG(small)
rather than small * LOG(big)
reads.
This could be rewritten as
SELECT DISTINCT bt.*
FROM small_table st
JOIN big_table bt
ON bt.big_table_column = st.small_table_column
However, this won't improve NOT IN
(as opposed to IN
). In MySQL
, NOT EXISTS
and LEFT JOIN / IS NULL
are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN
.
You may want to read these articles:
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
- IN vs. JOIN vs. EXISTS: Oracle
- IN vs. JOIN vs. EXISTS (SQL Server)
is NOT EXISTS bad SQL practice?
Given that:
Any reasonably query optimizer will be able to convert between “not exists”, “exists” and "joins", so there is normally no performance difference these days.
“Not exists” can often be easier to read then joins.
Therefore I don’t consider “Not exists” to be bad practice in the general case.
SQL using NOT EXISTS
You have to relate your not exists
subquery to the outer query. For example:
select clients.studentemail
from clients c
join invoices i
on c.clientid = i.clientid
where invoices.dateposted > "2013-04-01"
and not exists
(
select *
from appointments a
where c.clientid = a.clientid -- Relates outer to inner query
and a.servicedirection = "delivery"
and a.date > '2013-07-01')
)
Usage of exists and not exists in SQL
If the subquery returns at least one row, the result of EXISTS is true. In case the subquery returns no row, the result is of EXISTS is false.
https://www.postgresqltutorial.com/postgresql-exists/
Both sequeries retuns at least 1 row and there is no filter on the main query, so both main query return all rows
selec null -> 1 row
select customer_id
from customer
where residence = 'los angeles'
and age > 20 and age < 40
-> some rows
If you want to select a subset, just use where in your main query, no need to use exits.
Is it bad practice to check if something exists before inserting
The best approach is to make use of a unique constraint. Either ignore the constraint violation (using on duplicate key update
) or handle the error.
You can also do a check before inserting into the table. This might let you customize the error message better (for instance, if you had multiple unique constraints on the table). But you don't want to rely on such checks, because they are prone to race conditions. Multiple inserts at the same time could end up inserting the same row, because each "saw" that the table did not contain that row.
Performance of SQL EXISTS usage variants
The truth about the EXISTS clause is that the SELECT clause is not evaluated in an EXISTS clause - you could try:
SELECT *
FROM tableA
WHERE EXISTS (SELECT 1/0
FROM tableB
WHERE tableA.x = tableB.y)
...and should expect a divide by zero error, but you won't because it's not evaluated. This is why my habit is to specify NULL in an EXISTS to demonstrate that the SELECT can be ignored:
SELECT *
FROM tableA
WHERE EXISTS (SELECT NULL
FROM tableB
WHERE tableA.x = tableB.y)
All that matters in an EXISTS clause is the FROM and beyond clauses - WHERE, GROUP BY, HAVING, etc.
This question wasn't marked with a database in mind, and it should be because vendors handle things differently -- so test, and check the explain/execution plans to confirm. It is possible that behavior changes between versions...
Error in if not exists query in SQL Server
As a best practice, you should always define the list of columns you're inserting into when using INSERT
- that helps avoid a lot of problems !
And also: for the dates, to be independent of any language & regional settings, try to use the ISO-8601 format - YYYYMDDD
for just dates (no time), or YYYY-MM-DDTHH:MM:SS
for date & time.
So try this code:
INSERT INTO chennai_metro_data(col1, col2, ...., colN)
VALUES (2021700002, '20170123', '09:00', 1, 0, 555555)
and replace col1
thorugh colN
with your actual column names from that table that you want to insert data into.
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
In a nutshell:
NOT IN
is a little bit different: it never matches if there is but a single NULL
in the list.
In
MySQL
,NOT EXISTS
is a little bit less efficientIn
SQL Server
,LEFT JOIN / IS NULL
is less efficientIn
PostgreSQL
,NOT IN
is less efficientIn
Oracle
, all three methods are the same.
Related Topics
Sql Server Bug or Feature? Decimal Numbers Conversion
How to Concatenate Multiple Rows' Fields in a Sap Hana Table
How Do The SQL "Is" and "=" Operators Differ
Sql Select Distinct Substring Where Like Muddleup Howto
Get The Type of a Variable in MySQL
How to Expand Out a Row into Multiple Row Result Set
How to Select Most Frequent Value in a Column Per Each Id Group
Query Running Longer by Adding Unused Where Conditions
Does SQL Server Support Is Distinct from Clause
Row Locks - Manually Using Them
Elegant Way of Handling Postgresql Exceptions
How to Perform a Cross Join or Cartesian Product in Excel
Sql How to Remove Duplicates Within Select Query
How to Analyze 'Dbcc Memorystatus' Result in SQL Server 2008
Join Table Twice - on Two Different Columns of The Same Table