MySQL - Difference Between in and Exist

MYSQL - Difference between IN and EXIST

EXISTS

EXISTS literally is for checking for the existence of specified criteria. In current standard SQL, it will allow you to specify more than one criteria for comparison - IE if you want to know when col_a and col_b both match - which makes it a little stronger than the IN clause. MySQL IN supports tuples, but the syntax is not portable, so EXISTS is a better choice both for readability and portability.

The other thing to be aware of with EXISTS is how it operates - EXISTS returns a boolean, and will return a boolean on the first match. So if you're dealing with duplicates/multiples, EXISTS will be faster to execute than IN or JOINs depending on the data and the needs.

IN

IN is syntactic sugar for OR clauses. While it's very accommodating, there are issues with dealing with lots of values for that comparison (north of 1,000).

NOT

The NOT operator just reverses the logic.

Subqueries vs JOINs

The mantra "always use joins" is flawed, because JOINs risks inflating the result set if there is more than one child record against a parent. Yes, you can use DISTINCT or GROUP BY to deal with this, but it's very likely this renders the performance benefit of using a JOIN moot. Know your data, and what you want for a result set - these are key to writing SQL that performs well.

To reiterate knowing when and why to know what to use - LEFT JOIN IS NULL is the fastest exclusion list on MySQL if the columns compared are NOT nullable, otherwise NOT IN/NOT EXISTS are better choices.

Reference:

  • MySQL: LEFT JOIN/IS NULL, NOT IN, NOT EXISTS on nullable columns
  • MySQL: LEFT JOIN/IS NULL, NOT IN, NOT EXISTS on NOT nullable columns

Using in vs exists on mysql

Your two queries are very different. The first query is:

select e.lastname, e.firstname
from employees e
where e.officecode in (select o.officecode from offices o where o.country = 'USA');

(Note that I qualified all the column names.)

This gets employees where the corresponding office is in the USA.

This query is quite different:

select e.lastname, e.firstname
from employees e
where exists (select o.officecode from offices o where o.country = 'USA');

It is an all-or-nothing query. It returns all employees if any office is in the USA. It returns nothing otherwise.

To be equivalent to the first query, you need a correlation clause. This connects the inner query to the outer query:

select e.lastname, e.firstname
from employees e
where exists (select 1
from offices o
where o.officecode = e.officecode and o.country = 'USA'
);

With this change, the two queries should produce identical results.

Subqueries with EXISTS vs IN - MySQL

An Explain Plan would have shown you why exactly you should use Exists. Usually the question comes Exists vs Count(*). Exists is faster. Why?

  • With regard to challenges present by NULL: when subquery returns Null, for IN the entire query becomes Null. So you need to handle that as well. But using Exist, it's merely a false. Much easier to cope. Simply IN can't compare anything with Null but Exists can.

  • e.g. Exists (Select * from yourtable where bla = 'blabla'); you get true/false the moment one hit is found/matched.

  • In this case IN sort of takes the position of the Count(*) to select ALL matching rows based on the WHERE because it's comparing all values.

But don't forget this either:

  • EXISTS executes at high speed against IN : when the subquery results is very large.
  • IN gets ahead of EXISTS : when the subquery results is very small.

Reference to for more details:

  • subquery using IN.
  • IN - subquery optimization
  • Join vs. sub-query.

What is better option? IN operator or EXISTS

The two queries do different things. You probably intend a correlated subquery for EXISTS:

SELECT Customer_ID c
FROM Customers c
WHERE EXISTS (SELECT 1 FROM Sales s WHERE s.Cust_ID = c.Customer_Id);

Both methods are fine for expressing your logic. I tend to prefer EXISTS for two reasons:

  • NOT EXISTS is generally a better choice than NOT IN because of the way it handles NULL. This does not apply to EXISTS/IN, but it spills over.
  • EXISTS is generally no worse than IN from a performance perspective.

`Exists`and `IN` give two different results in MySQL

You don't need the JOIN operation in the subquery of the EXISTS operator:

SELECT * FROM commodity c 
WHERE exists(SELECT c.*
FROM specifications s
WHERE s.id < 600 and c.id = s.cid );

The two queries are now equivalent provided that id is a not NULL field.

What's the difference between 'ANY' and 'EXISTS' in sql-server

The two queries are quite different.

The first query returns all rows or no rows depending on whether the subquery returns any rows at all or no rows.

You intend a correlated subquery:

select code from account where exists (select 1 from store where store.account = account.code)

These should be equivalent.

What's the difference between 'not in' and 'not exists'?

I think it serves the same purpose.

not in can also take literal values whereas not exists need a query to compare the results with.

EDIT: not exists could be good to use because it can join with the outer query & can lead to usage of index, if the criteria uses column that is indexed.

EDIT2: See this question as well.

EDIT3: Let me take the above things back.

See this link. I think, it all depends on how the DB translates this & on database/indexes etc.

Mysql Exists vs IN -- correlated subquery vs subquery?

This is a RDBMS-agnostic answer, but may help nonetheless. In my understanding, the correlated (aka, dependent) subquery is perhaps the most often falsely accused culprit for bad performance.

The problem (as it is most often described) is that it processes the inner query for every row of the outer query. Therefore, if the outer query returns 1,000 rows, and the inner query returns 10,000, then your query has to slog through 10,000,000 rows (outer×inner) to produce a result. Compared to the 11,000 rows (outer+inner) from a non-correlated query over the same result sets, that ain't good.

However, this is just the worst case scenario. In many cases, the DBMS will be able to exploit indexes to drastically reduce the rowcount. Even if only the inner query can use an index, the 10,000 rows becomes ~13 seeks, which drops the total down to 13,000.

The exists operator can stop processing rows after the first, cutting down the query cost further, especially when most outer rows match at least one inner row.

In some rare cases, I have seen SQL Server 2008R2 optimise correlated subqueries to a merge join (which traverses both sets only once - best possible scenario) where a suitable index can be found in both inner and outer queries.

The real culprit for bad performance is not necessarily correlated subqueries, but nested scans.



Related Topics



Leave a reply



Submit