MySQL or VS in Performance

MySQL OR vs IN performance

The accepted answer doesn't explain the reason.

Below are quoted from High Performance MySQL, 3rd Edition.

In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)

MYSQL query WHERE IN vs OR

Use IN.

IN will use an index.

OR will (afaik) not use an index.

Also, and this point is not to be sneezed at, the IN version:

  • uses less code
  • is easier to maintain
  • is easier to understand

For those reasons alone I would be prepared suffer a little performance to gain code quality, but you actually gain performance too.

MySQL Performance - IN Clause vs. Equals (=) for a Single Value

Neither of them really matter in the big scope of things. The network latency in communicating with the database will far outweigh either the count($object_ids) overhead or the = vs IN overhead. I would call this a case of premature optimization.

You should profile and load-test your application to learn where the real bottlenecks are.

Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?

BETWEEN should outperform IN in this case (but do measure and check execution plans, too!), especially as n grows and as statistics are still accurate. Let's assume:

  • m is the size of your table
  • n is the size of your range

Index can be used (n is tiny compared to m)

  • In theory, BETWEEN can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at most n index leaf nodes. The complexity will be O(n + log m)

  • IN is usually implemented as a series (loop) of n "range scans" on the primary key index. With m being the size of the table, the complexity will always be O(n * log m) ... which is always worse (neglibile for very small tables m or very small ranges n)

Index cannot be used (n is a significant portion of m)

In any case, you'll get a full table scan and evaluate the predicate on each row:

  • BETWEEN needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity is O(m)

  • IN needs to evaluate at most n predicates. The complexity is O(m * n) ... which is again always worse, or perhaps O(m) if the database can optimise the IN list to be a hashmap, rather than a list of predicates.

SQL Performance UNION vs OR

Either the article you read used a bad example, or you misinterpreted their point.

select username from users where company = 'bbc' or company = 'itv';

This is equivalent to:

select username from users where company IN ('bbc', 'itv');

MySQL can use an index on company for this query just fine. There's no need to do any UNION.

The more tricky case is where you have an OR condition that involves two different columns.

select username from users where company = 'bbc' or city = 'London';

Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.

The UNION solution is for this type of case.

select username from users where company = 'bbc' 
union
select username from users where city = 'London';

Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.


An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.

My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.

The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.

MySQL IN operator performance on (large?) number of values

Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000.

However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use:

WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836

Or whatever the gaps are.

MySQL WHEN vs. WHERE in performance?

This is a bit long for a comment.

Your two queries are quite different. The first will affect only rows that have the three values for id. The second will affect all rows, setting the description to NULL for rows that have any other value for id. To be equivalent, the second query should be:

UPDATE category
SET description = (CASE id
WHEN 1 THEN 'good'
WHEN 2 THEN 'bad'
WHEN 3 THEN 'ugly'
ELSE description
END);

These two queries are still semantically different, although the effect on the data is the same. For instance, this version would call an update trigger on all rows, whereas the first version would only call it on rows that match the WHERE condition.

You should use the WHERE condition, if you care about performance and maintainability of the query.



Related Topics



Leave a reply



Submit