Use a Union or a Join - What Is Faster

Use A Union Or A Join - What Is Faster

Union will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table.

The Join will go through each row of both tables, finding matches in the other table therefore needing a lot more processing due to searching for matching rows for each and every row.

EDIT

By Union, I mean Union All as it seemed adequate for what you were trying to achieve. Although a normal Union is generally faster then Join.

EDIT 2 (Reply to @seebiscuit 's comment)

I don't agree with him. Technically speaking no matter how good your join is, a "JOIN" is still more expensive than a pure concatenation. I made a blog post to prove it at my blog codePERF[dot]net. Practically speaking they serve 2 completely different purposes and it is more important to ensure your indexing is right and using the right tool for the job.

Technically, I think it can be summed using the following 2 execution plans taken from my blog post:

UNION ALL Execution Plan

UNION ALL Execution Plan

JOIN Execution Plan

JOIN Execution Plan

Practical Results

Practically speaking the difference on a clustered index lookup is negligible:

Benchmark Results

Can UNION ALL be faster than JOINs or do my JOINs just suck?

The UNION ALL version would probably be satisfied quite easily by 2 index seeks. OR can lead to scans. What do the execution plans look like?

Also have you tried this to avoid accessing Notes twice?

;WITH J AS
(
SELECT UniqueID FROM Leads WHERE LeadID = @LeadID
UNION ALL
SELECT UniqueID FROM Quotes WHERE LeadID = @LeadID
)

SELECT N.* /*Don't use * though!*/
FROM Notes N
JOIN J ON N.TargetUniqueID = J.UniqueID

Full outer joins or Union . which is faster ( My table has a million rows )

Not sure what your query looks like however, add indexes on the table if these tables are newly being created.

However, answering your question using UNION ALL will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table. Even a Normal UNION is faster than a join.

The UNION's will make better use of indexes which could result in a faster query.

Performance of two left joins versus union

Your two queries do not do the same things. In particular, the first will return duplicate rows if values are duplicated in either table.

If you are looking for rows in Table1 that are in either of the other two tables, I would suggest using exists:

select t1.*
from Table1 t1
where exists (select 1 from Table2 t2 where t2.Table1Id = t1.id) or
exists (select 1 from Table3 t3 where t3.Table1Id = t1.id);

And, create indexes on Table1Id in both Table2 and Table3.

Which of your original queries is faster depends a lot on the data. The second has an extra step to remove duplicates (union verses union all). On the other hand, the first might end up creating many duplicate rows.

Performance of JOIN then UNION vs. UNION then JOIN

The SQL optimizer has more information on "bare" tables than on "computed" tables. So, it is easier to optimize the two CTEs.

In a database that uses indexes, this might affect index usage. In Redshift, this might incur additional data movement.

In this particular case, though, I suspect the issue might have to do with filtering via the JOIN operation. The UNION is incurring overhead to remove duplicates. By filtering before the UNION, duplicate removal is faster than filtering afterwards.

In addition, the UNION may affect where the data is located, so the second version might require additional data movement.

Why is UNION much faster than LEFT JOIN with OR?

I managed to solve the problem by adding an index to the pivot table:

ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);

Run time: 0.188 seconds

It's slightly faster than using the UNION method.

What is the difference between JOIN and UNION?

UNION puts lines from queries after each other, while JOIN makes a cartesian product and subsets it -- completely different operations. Trivial example of UNION:

mysql> SELECT 23 AS bah
-> UNION
-> SELECT 45 AS bah;
+-----+
| bah |
+-----+
| 23 |
| 45 |
+-----+
2 rows in set (0.00 sec)

similary trivial example of JOIN:

mysql> SELECT * FROM 
-> (SELECT 23 AS bah) AS foo
-> JOIN
-> (SELECT 45 AS bah) AS bar
-> ON (33=33);
+-----+-----+
| foo | bar |
+-----+-----+
| 23 | 45 |
+-----+-----+
1 row in set (0.01 sec)

Why is UNION faster than an OR statement

The reason is that using OR in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans. If you look at the execution plans for your two queries, you'll most likely see scans where you are using the OR and seeks where you are using the UNION. Without seeing your query it's not really possible to give you any ideas on how you might be able to restructure the OR condition. But you may find that inserting the rows into a temporary table and joining on to it may yield a positive result.

Also, it is generally best to use UNION ALL rather than UNION if you want all results, as you remove the cost of row-matching.

Why using OR condition instead of Union caused a performance Issue

Using UNION ALL to replace OR is actually one of the well known optimization tricks. The best reference and explanation is in this article: Index Union.

The gist of it is that OR predicates that could be be satisfied by two index seeks cannot be reliably detected by the query optimizer (the reason being impossibility to predict the disjoint sets from the two sides of the OR). So when expressing the same condition as an UNION ALL then the optimizer has no problem creating a plan that does two short seeks and unions the results. The important thing is to realize that a=1 or b=2 can be different from a=1 union all b=2 because the first query returns rows that satisfy both conditions once, while the later returns them twice. When you write the query as UNION ALL you are telling the compiler that you understand that and you have no problem with it.

For further reference see How to analyse SQL Server performance.

Is it better do to a union in SQL or separate queries and then use php array_merge?

The most definitive answer is to test each method, however the UNION is most likely to be faster as only one query is run by MySQL as opposed to 4 for each part of the union.

You also remove the overhead of reading the data into memory in PHP and concatenating it. Instead, you can just do a while() or foreach() or whatever on one result.



Related Topics



Leave a reply



Submit