Is Using "Not Exists" Considered to Be Bad SQL Practise

Is using “NOT EXISTS” considered to be bad SQL practise?

In MySQL, Oracle, SQL Server and PostgreSQL, NOT EXISTS is of the same efficiency or even more efficient than LEFT JOIN / IS NULL.

While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS and even worse for NOT IN, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join methods.

In SQL Server, actually, LEFT JOIN / IS NULL may be less efficient than NOT EXISTS / NOT IN in case of unindexed or low cardinality column in the inner table.

It is often heard that MySQL is "especially bad in treating subqueries".

This roots from the fact that MySQL is not capable of any join methods other than nested loops, which severely limits its optimization abilities.

The only case when a query would benefit from rewriting subquery as a join would be this:

SELECT  *
FROM big_table
WHERE big_table_column IN
(
SELECT small_table_column
FROM small_table
)

small_table will not be queried completely for each record in big_table: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS (using index_subquery to search for the first much if needed if small_table_column is indexed)

But big_table would always be leading, which makes the query complete in big * LOG(small) rather than small * LOG(big) reads.

This could be rewritten as

SELECT  DISTINCT bt.*
FROM small_table st
JOIN big_table bt
ON bt.big_table_column = st.small_table_column

However, this won't improve NOT IN (as opposed to IN). In MySQL, NOT EXISTS and LEFT JOIN / IS NULL are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN.

You may want to read these articles:

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
  • IN vs. JOIN vs. EXISTS: Oracle
  • IN vs. JOIN vs. EXISTS (SQL Server)

is NOT EXISTS bad SQL practice?

Given that:

  • Any reasonably query optimizer will be able to convert between “not exists”, “exists” and "joins", so there is normally no performance difference these days.

  • “Not exists” can often be easier to read then joins.

Therefore I don’t consider “Not exists” to be bad practice in the general case.

SQL using NOT EXISTS

You have to relate your not exists subquery to the outer query. For example:

select  clients.studentemail 
from clients c
join invoices i
on c.clientid = i.clientid
where invoices.dateposted > "2013-04-01"
and not exists
(
select *
from appointments a
where c.clientid = a.clientid -- Relates outer to inner query
and a.servicedirection = "delivery"
and a.date > '2013-07-01')
)

Usage of exists and not exists in SQL

If the subquery returns at least one row, the result of EXISTS is true. In case the subquery returns no row, the result is of EXISTS is false.

https://www.postgresqltutorial.com/postgresql-exists/

Both sequeries retuns at least 1 row and there is no filter on the main query, so both main query return all rows

  • selec null -> 1 row

  • select customer_id
    from customer
    where residence = 'los angeles'
    and age > 20 and age < 40
    -> some rows

If you want to select a subset, just use where in your main query, no need to use exits.

Is it bad practice to check if something exists before inserting

The best approach is to make use of a unique constraint. Either ignore the constraint violation (using on duplicate key update) or handle the error.

You can also do a check before inserting into the table. This might let you customize the error message better (for instance, if you had multiple unique constraints on the table). But you don't want to rely on such checks, because they are prone to race conditions. Multiple inserts at the same time could end up inserting the same row, because each "saw" that the table did not contain that row.

Performance of SQL EXISTS usage variants

The truth about the EXISTS clause is that the SELECT clause is not evaluated in an EXISTS clause - you could try:

SELECT * 
FROM tableA
WHERE EXISTS (SELECT 1/0
FROM tableB
WHERE tableA.x = tableB.y)

...and should expect a divide by zero error, but you won't because it's not evaluated. This is why my habit is to specify NULL in an EXISTS to demonstrate that the SELECT can be ignored:

SELECT * 
FROM tableA
WHERE EXISTS (SELECT NULL
FROM tableB
WHERE tableA.x = tableB.y)

All that matters in an EXISTS clause is the FROM and beyond clauses - WHERE, GROUP BY, HAVING, etc.

This question wasn't marked with a database in mind, and it should be because vendors handle things differently -- so test, and check the explain/execution plans to confirm. It is possible that behavior changes between versions...

Error in if not exists query in SQL Server

As a best practice, you should always define the list of columns you're inserting into when using INSERT - that helps avoid a lot of problems !

And also: for the dates, to be independent of any language & regional settings, try to use the ISO-8601 format - YYYYMDDD for just dates (no time), or YYYY-MM-DDTHH:MM:SS for date & time.

So try this code:

INSERT INTO chennai_metro_data(col1, col2, ...., colN)
VALUES (2021700002, '20170123', '09:00', 1, 0, 555555)

and replace col1 thorugh colN with your actual column names from that table that you want to insert data into.

What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

In a nutshell:

NOT IN is a little bit different: it never matches if there is but a single NULL in the list.

  • In MySQL, NOT EXISTS is a little bit less efficient

  • In SQL Server, LEFT JOIN / IS NULL is less efficient

  • In PostgreSQL, NOT IN is less efficient

  • In Oracle, all three methods are the same.



Related Topics



Leave a reply



Submit