UNION ALL vs OR condition in sql server query
The query plan is also affected by the number of rows in your tables. How many rows are there in table t
?
You could also try:
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
SELECT 1 FROM TABLE t
WHERE Data1 = t.Col1 AND Data2=t.Col2
)
AND NOT EXISTS
(
SELECT 1 FROM TABLE t
WHERE Data1 = t.Col2 AND Data2=t.Col1
)
or (corrected for SQL-Server) this that will use the index:
WITH tt AS <---- a temp table with 2 rows
( SELECT Data1 AS Col1, Data2 AS Col2
UNION
SELECT Data2 AS Col1, Data1 AS Col2
)
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
SELECT 1
FROM TABLE t
JOIN tt
ON tt.Col1 = t.Col1 AND tt.Col2=t.Col2
)
Why using OR condition instead of Union caused a performance Issue
Using UNION ALL
to replace OR
is actually one of the well known optimization tricks. The best reference and explanation is in this article: Index Union.
The gist of it is that OR
predicates that could be be satisfied by two index seeks cannot be reliably detected by the query optimizer (the reason being impossibility to predict the disjoint sets from the two sides of the OR). So when expressing the same condition as an UNION ALL then the optimizer has no problem creating a plan that does two short seeks and unions the results. The important thing is to realize that a=1 or b=2
can be different from a=1 union all b=2
because the first query returns rows that satisfy both conditions once, while the later returns them twice. When you write the query as UNION ALL you are telling the compiler that you understand that and you have no problem with it.
For further reference see How to analyse SQL Server performance.
Why is UNION faster than an OR statement
The reason is that using OR
in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans. If you look at the execution plans for your two queries, you'll most likely see scans where you are using the OR
and seeks where you are using the UNION
. Without seeing your query it's not really possible to give you any ideas on how you might be able to restructure the OR
condition. But you may find that inserting the rows into a temporary table and joining on to it may yield a positive result.
Also, it is generally best to use UNION ALL
rather than UNION
if you want all results, as you remove the cost of row-matching.
SQL Performance UNION vs OR
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company
for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR
condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company
and a separate index on city
. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company
, it would still have to do a table-scan to find rows where city
is London. If it uses the index on city
, it would have to do a table-scan for rows where company
is bbc.
The UNION
solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION
.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
What is the difference between UNION and UNION ALL?
UNION
removes duplicate records (where all columns in the results are the same), UNION ALL
does not.
There is a performance hit when using UNION
instead of UNION ALL
, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Microsoft SQL server View of multiple conditional statements with UNION
You seem to want:
create view myview as
select 'a' cat, name from table1
union all select 'b', name from table2
union all select 'c', name from table3
Then you can query the view like so:
select * from myview where cat in ('a', 'b');
Where clause between union all in sql?
I can imagine you want all of the rows for a CID
sorted by _row_ord
from the first table before the ones from the second table. And the CID
should be the outermost sort criteria.
If that's right, you can select literals from your tables. Let the literal for the first table be less than that of the second table. Then first sort by CID
, then that literal and finally by _row_ord
.
SELECT cid,
_data
FROM (SELECT 1 s,
_row_ord,
cid,
_data
FROM #temp1
UNION ALL
SELECT 2 s,
_row_ord,
cid,
_data
FROM #temp2) x
ORDER BY cid,
s,
_row_ord;
db<>fiddle
Related Topics
Designing a SQL Schema for a Combination of Many-To-Many Relationship (Variations of Products)
Ora-01652: Unable to Extend Temp Segment by 128 in Tablespace System: How to Extend
SQL Error: Ora-00942 Table or View Does Not Exist
Counting the Number of Occurrences of a Substring Within a String in Postgresql
Select Distinct from Multiple Fields Using SQL
Syntax Error at End of Input in Postgresql
How to Use If/Else Statement to Update or Create New Xml Node Entry in SQL
Differencebetween a Stored Procedure and a View
Update Multiple Columns in SQL
Get Previous and Next Row from Rows Selected with (Where) Conditions
How to Delete the Top 1000 Rows from a Table Using SQL Server 2008
Update Multiple Rows with One Query
Eliminate and Reduce Overlapping Date Ranges
Creating or Simulating Two Dimensional Arrays in Pl/Sql
Using Ssis to Extract a Xml Representation of Table Data to a File
Postgresql: Give All Permissions to a User on a Postgresql Database