MySQL OR vs IN performance
The accepted answer doesn't explain the reason.
Below are quoted from High Performance MySQL, 3rd Edition.
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)
MYSQL query WHERE IN vs OR
Use IN.
IN will use an index.
OR will (afaik) not use an index.
Also, and this point is not to be sneezed at, the IN version:
- uses less code
- is easier to maintain
- is easier to understand
For those reasons alone I would be prepared suffer a little performance to gain code quality, but you actually gain performance too.
MySQL Performance - IN Clause vs. Equals (=) for a Single Value
Neither of them really matter in the big scope of things. The network latency in communicating with the database will far outweigh either the count($object_ids)
overhead or the =
vs IN
overhead. I would call this a case of premature optimization.
You should profile and load-test your application to learn where the real bottlenecks are.
Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?
BETWEEN
should outperform IN
in this case (but do measure and check execution plans, too!), especially as n
grows and as statistics are still accurate. Let's assume:
m
is the size of your tablen
is the size of your range
Index can be used (n
is tiny compared to m
)
In theory,
BETWEEN
can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at mostn
index leaf nodes. The complexity will beO(n + log m)
IN
is usually implemented as a series (loop) ofn
"range scans" on the primary key index. Withm
being the size of the table, the complexity will always beO(n * log m)
... which is always worse (neglibile for very small tablesm
or very small rangesn
)
Index cannot be used (n
is a significant portion of m
)
In any case, you'll get a full table scan and evaluate the predicate on each row:
BETWEEN
needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity isO(m)
IN
needs to evaluate at mostn
predicates. The complexity isO(m * n)
... which is again always worse, or perhapsO(m)
if the database can optimise theIN
list to be a hashmap, rather than a list of predicates.
SQL Performance UNION vs OR
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company
for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR
condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company
and a separate index on city
. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company
, it would still have to do a table-scan to find rows where city
is London. If it uses the index on city
, it would have to do a table-scan for rows where company
is bbc.
The UNION
solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION
.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
MySQL IN operator performance on (large?) number of values
Generally speaking, if the IN
list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.
If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000
.
However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use:
WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836
Or whatever the gaps are.
MySQL WHEN vs. WHERE in performance?
This is a bit long for a comment.
Your two queries are quite different. The first will affect only rows that have the three values for id
. The second will affect all rows, setting the description to NULL
for rows that have any other value for id
. To be equivalent, the second query should be:
UPDATE category
SET description = (CASE id
WHEN 1 THEN 'good'
WHEN 2 THEN 'bad'
WHEN 3 THEN 'ugly'
ELSE description
END);
These two queries are still semantically different, although the effect on the data is the same. For instance, this version would call an update
trigger on all rows, whereas the first version would only call it on rows that match the WHERE
condition.
You should use the WHERE
condition, if you care about performance and maintainability of the query.
Related Topics
Can't Connect to MySQL Server Error 111
How to Do 'Insert If Not Exists' in MySQL
How to Select the First Row of Each Group
MySQL Query Finding Values in a Comma Separated String
Recommended SQL Database Design For Tags or Tagging
MySQL Error:: 'Access Denied For User 'Root'@'Localhost'
Condition Within Join or Where
SQL Server: How to Insert into Two Tables At the Same Time
How to Find the MySQL My.Cnf Location
How to Update from a Select in SQL Server
Foreign Key Constraint May Cause Cycles or Multiple Cascade Paths
How to Declare a Variable in a Postgresql Query
Computed/Calculated/Virtual/Derived Columns in Postgresql
Datetime2 VS Datetime in SQL Server
Error Code: 2013. Lost Connection to MySQL Server During Query