Why is UNION faster than an OR statement
The reason is that using OR
in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans. If you look at the execution plans for your two queries, you'll most likely see scans where you are using the OR
and seeks where you are using the UNION
. Without seeing your query it's not really possible to give you any ideas on how you might be able to restructure the OR
condition. But you may find that inserting the rows into a temporary table and joining on to it may yield a positive result.
Also, it is generally best to use UNION ALL
rather than UNION
if you want all results, as you remove the cost of row-matching.
Why using OR condition instead of Union caused a performance Issue
Using UNION ALL
to replace OR
is actually one of the well known optimization tricks. The best reference and explanation is in this article: Index Union.
The gist of it is that OR
predicates that could be be satisfied by two index seeks cannot be reliably detected by the query optimizer (the reason being impossibility to predict the disjoint sets from the two sides of the OR). So when expressing the same condition as an UNION ALL then the optimizer has no problem creating a plan that does two short seeks and unions the results. The important thing is to realize that a=1 or b=2
can be different from a=1 union all b=2
because the first query returns rows that satisfy both conditions once, while the later returns them twice. When you write the query as UNION ALL you are telling the compiler that you understand that and you have no problem with it.
For further reference see How to analyse SQL Server performance.
SQL Performance UNION vs OR
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company
for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR
condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company
and a separate index on city
. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company
, it would still have to do a table-scan to find rows where city
is London. If it uses the index on city
, it would have to do a table-scan for rows where company
is bbc.
The UNION
solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION
.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Why is a union of two queries faster than a single one of the unioned queries?
It seems the timings that I was using were from PgAdmin and after SSHing into the actual database network/server I can see that the difference was negligible between the two variants and it was actually the same as appears in the EXPLAIN ANALYSE
.
So it's actually not quicker doing a UNION than the queries individually/
Use A Union Or A Join - What Is Faster
Union will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table.
The Join will go through each row of both tables, finding matches in the other table therefore needing a lot more processing due to searching for matching rows for each and every row.
EDIT
By Union, I mean Union All as it seemed adequate for what you were trying to achieve. Although a normal Union is generally faster then Join.
EDIT 2 (Reply to @seebiscuit 's comment)
I don't agree with him. Technically speaking no matter how good your join is, a "JOIN" is still more expensive than a pure concatenation. I made a blog post to prove it at my blog codePERF[dot]net. Practically speaking they serve 2 completely different purposes and it is more important to ensure your indexing is right and using the right tool for the job.
Technically, I think it can be summed using the following 2 execution plans taken from my blog post:
UNION ALL
Execution Plan
JOIN
Execution Plan
Practical Results
Practically speaking the difference on a clustered index lookup is negligible:
Why is UNION much faster than LEFT JOIN with OR?
I managed to solve the problem by adding an index to the pivot table:
ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);
Run time: 0.188 seconds
It's slightly faster than using the UNION method.
Performance of UNION versus UNION ALL in SQL Server
UNION ALL will perform better than UNION when you're not concerned about eliminating duplicate records because you're avoiding an expensive distinct sort operation. See: SQL SERVER – Difference Between Union vs. Union All – Optimal Performance Comparison
What is the difference between UNION and UNION ALL?
UNION
removes duplicate records (where all columns in the results are the same), UNION ALL
does not.
There is a performance hit when using UNION
instead of UNION ALL
, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Related Topics
How to Make Comment Reply Query in MySQL
How to Create SQL Synonym or "Alias" for Database Name
Using in Clause in a Native SQL Query
How to Select All Hours Between Two Dates
Retrieving a Row, with Data from Key-Value Pair Table in MySQL
Creating Sumif Function in SQL Server 2012
Generate SQL Temp Table of Sequential Dates to Left Outer Join To
T-SQL Get Number of Working Days Between 2 Dates
Tsql Datediff to Return Number of Days with 2 Decimal Places
How to Change the Name of the Athena Results Stored in S3
Row with Minimum Value of a Column
A Reliable Way to Verify T-SQL Stored Procedures
Reseed Identity Column in SQL Compact
How to Properly Trigger an Insert to a Linked SQL Server
Delphi: How to Pass a List as a Parameter to a SQL Query
T-SQL Select Get All Months Within a Range of Years
Getting a Dynamically-Generated Pivot-Table into a Temp Table