What Is Easier to Read in Exists Subqueries

Mysql Exists vs IN -- correlated subquery vs subquery?

This is a RDBMS-agnostic answer, but may help nonetheless. In my understanding, the correlated (aka, dependent) subquery is perhaps the most often falsely accused culprit for bad performance.

The problem (as it is most often described) is that it processes the inner query for every row of the outer query. Therefore, if the outer query returns 1,000 rows, and the inner query returns 10,000, then your query has to slog through 10,000,000 rows (outer×inner) to produce a result. Compared to the 11,000 rows (outer+inner) from a non-correlated query over the same result sets, that ain't good.

However, this is just the worst case scenario. In many cases, the DBMS will be able to exploit indexes to drastically reduce the rowcount. Even if only the inner query can use an index, the 10,000 rows becomes ~13 seeks, which drops the total down to 13,000.

The exists operator can stop processing rows after the first, cutting down the query cost further, especially when most outer rows match at least one inner row.

In some rare cases, I have seen SQL Server 2008R2 optimise correlated subqueries to a merge join (which traverses both sets only once - best possible scenario) where a suitable index can be found in both inner and outer queries.

The real culprit for bad performance is not necessarily correlated subqueries, but nested scans.

Subqueries with EXISTS vs IN - MySQL

An Explain Plan would have shown you why exactly you should use Exists. Usually the question comes Exists vs Count(*). Exists is faster. Why?

  • With regard to challenges present by NULL: when subquery returns Null, for IN the entire query becomes Null. So you need to handle that as well. But using Exist, it's merely a false. Much easier to cope. Simply IN can't compare anything with Null but Exists can.

  • e.g. Exists (Select * from yourtable where bla = 'blabla'); you get true/false the moment one hit is found/matched.

  • In this case IN sort of takes the position of the Count(*) to select ALL matching rows based on the WHERE because it's comparing all values.

But don't forget this either:

  • EXISTS executes at high speed against IN : when the subquery results is very large.
  • IN gets ahead of EXISTS : when the subquery results is very small.

Reference to for more details:

  • subquery using IN.
  • IN - subquery optimization
  • Join vs. sub-query.

What do I have to SELECT in a WHERE EXIST clause?

It doesn't matter. A good practice is to use SELECT 1 to indicate it is a non-data returning subquery.

The select is not evaluated and doesn't matter. In SQL Server you can put a SELECT 1/0 in the exists subquery and it will not throw a divide by zero error even.

Related: What is easier to read in EXISTS subqueries?
https://dba.stackexchange.com/questions/159413/exists-select-1-vs-exists-select-one-or-the-other

For the non-believers:

 DECLARE @table1 TABLE (id INT)
DECLARE @table2 TABLE (id INT)

INSERT INTO @table1
VALUES
(1),
(2),
(3),
(4),
(5)


INSERT INTO @table2
VALUES
(1),
(2),
(3)

SELECT *
FROM @table1 t1
WHERE EXISTS (
SELECT 1/0
FROM @table2 t2
WHERE t1.id = t2.id)

EXISTS subquery: SELECT 1 or SELECT * FROM X performant in Postgres?

Per the documentation:

Since the result depends only on whether any rows are returned, and
not on the contents of those rows, the output list of the subquery is
normally unimportant.

Join vs. sub-query

Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOIN, but in my opinion their strength is slightly higher readability.

JOIN or Correlated subquery with exists clause, which one is better

Generally, the EXISTS clause because you may need DISTINCT for a JOIN for it to give the expected output. For example, if you have multiple Department rows for a ContactInformation row.

In your example above, the SELECT *:

  • means different output too so they are not actually equivalent
  • less chance of a index being used because you are pulling all columns out

Saying that, even with a limited column list, they will give the same plan: until you need DISTINCT... which is why I say "EXISTS"



Related Topics



Leave a reply



Submit