Is There a Performance Difference Between Between and in with MySQL or in SQL in General

Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?

BETWEEN should outperform IN in this case (but do measure and check execution plans, too!), especially as n grows and as statistics are still accurate. Let's assume:

  • m is the size of your table
  • n is the size of your range

Index can be used (n is tiny compared to m)

  • In theory, BETWEEN can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at most n index leaf nodes. The complexity will be O(n + log m)

  • IN is usually implemented as a series (loop) of n "range scans" on the primary key index. With m being the size of the table, the complexity will always be O(n * log m) ... which is always worse (neglibile for very small tables m or very small ranges n)

Index cannot be used (n is a significant portion of m)

In any case, you'll get a full table scan and evaluate the predicate on each row:

  • BETWEEN needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity is O(m)

  • IN needs to evaluate at most n predicates. The complexity is O(m * n) ... which is again always worse, or perhaps O(m) if the database can optimise the IN list to be a hashmap, rather than a list of predicates.

SQL: BETWEEN and IN (which is faster)

  • If your ids are always consecutive you should use BETWEEN.
  • If your ids may or may not be consecutive then use IN.

Performance shouldn't really be the deciding factor here. Having said that, BETWEEN seems to be faster in all examples that I have tested. For example:

Without indexes, checking a table with a million rows where every row has x = 1:


SELECT COUNT(*) FROM table1 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.55s

SELECT COUNT(*) FROM table1 WHERE x BETWEEN 1 AND 6;
Time taken: 0.54s

Without indexes, checking a table with a million rows where x has unique values:


SELECT COUNT(*) FROM table1 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.65s

SELECT COUNT(*) FROM table1 WHERE x BETWEEN 1 AND 6;
Time taken: 0.36s

A more realistic example though is that the id column is unique and indexed. When you do this the performance of both queries becomes close to instant.


SELECT COUNT(*) FROM table2 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.00s

SELECT COUNT(*) FROM table2 WHERE x BETWEEN 1 AND 6;
Time taken: 0.00s

So I'd say concentrate on writing a clear SQL statement rather than worrying about minor differences in execution speed. And make sure that the table is correctly indexed because that will make the biggest difference.

Note: Tests were performed on SQL Server Express 2008 R2. Results may be different on other systems.

Performance differences between equal (=) and IN with one literal value

There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.

Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.

After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):

If there is only one value inside the parenthesis, this commend [sic] is equivalent to,

WHERE "column_name" = 'value1

Here is the execution plan of both queries in Oracle (most DBMS will process this the same):

EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'

Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------

And for IN() :

EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');

Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------

As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).

MySQL performance difference between JOIN and IN

In general, a query using a join will perform better than an equivalent query using IN (...), because the former can take advantage of indexes while the latter can't; the entire IN list must be scanned for each row which might be returned.

(Do note that some database engines perform better than others in this case; for example, SQL Server can produce equivalent performance for both types of queries.)

You can see what the MySQL query optimizer intends to do with a given SELECT query by prepending EXPLAIN to the query and running it. This will give you, among other things, a count of rows the engine will have to examine for each step in a query; multiply these counts to get the overall number of rows the engine will have to visit, which can serve as a rough estimate of likely performance.

IN vs OR in the SQL WHERE clause

I assume you want to know the performance difference between the following:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

The best way to know is to profile both on your database with your specific data to see which is faster.

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.

Performance difference using IF function vs AND in MySQL WHERE clause

Add an index on column1 and use UNION to combine the two conditions.

Even better might be to have a composite index on (column1, created), so both parts of the condition can be done entirely within the index.

SELECT *
FROM table
WHERE column1 > 500 AND created > NOW() - INTERVAL 365 DAY

UNION ALL

SELECT *
FROM table
WHERE column1 <= 500 AND created > NOW() - INTERVAL 750 DAY

Difference between performance of the two sql queries?

Picking up from your comment : " I just want to know if a starts with match is diff from an ends with match".

Firstly - remember that we are not looking for the best algorithm to match a string. We are looking for the best algorithm to find all matching strings in a set of N rows. We want to do better than 'Do algorithm X, N times'.

If fieldname is NOT indexed, then there will be very little difference in performance between the two queries - the SQL engine is just going to do a match on the first 3 or last 3 bytes of the string, which is simply a matter of offsetting to the right memory location.

If the fieldname IS indexed, there will be a huge difference in performance between the two searches, because rather than examining all N rows, we can discard most of the data.

i.e. for the "xyz%" version, we can use a binary search.

We start at the middle element, which happens to be 'peter'. We can immediately discard everything before 'peter' and get the middle element on the remainder - 'samantha', and so on, until we find the entries starting 'xyz'.

With the "%xyz" version, we cannot do this, as ANY string could potentially match at the end, we need to look at every string.

As the size of our table expands, the difference between these two approaches becomes large.

The solution of creating a field/index for the reverse of fieldname allows us to use the binary search technique again. (In some databases it is actual possible to do this without creating an extra field, but through using particular index types, virtual columns, etc).

This is simplified a lot - for detail on the actual implementation of database indexes, look into B-Tree and B*Tree indexes.

Difference between local MySQL query and production server query execution times

Sorry for taking your time guys... It was a rookie mistake in which I didn't read the error messages when importing the database.

When I generated the mysqldump, some table names were incorrectly generated with lowercase-only letters and that caused an error when importing.

Since the indexes of everything were after the erroneous instructions they never got executed so I basically did non-indexed full table scans and that's why it took like forever to load results.

I corrected my SQL file and created the database again and it worked like a charm. Sorry for wasting your time guys.

PS: I actually boosted the server to 16GB of RAM and 6VCPUs and it made no difference whatsoever.

SQL Performance UNION vs OR

Either the article you read used a bad example, or you misinterpreted their point.

select username from users where company = 'bbc' or company = 'itv';

This is equivalent to:

select username from users where company IN ('bbc', 'itv');

MySQL can use an index on company for this query just fine. There's no need to do any UNION.

The more tricky case is where you have an OR condition that involves two different columns.

select username from users where company = 'bbc' or city = 'London';

Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.

The UNION solution is for this type of case.

select username from users where company = 'bbc' 
union
select username from users where city = 'London';

Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.


An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.

My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.

The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.



Related Topics



Leave a reply



Submit