remove duplicates from sql union
Union
will remove duplicates. Union All
does not.
SQL UNION ALL to eliminate duplicates
But in the example, the first query has a condition on column a
, whereas the second query has a condition on column b
. This probably came from a query that's hard to optimize:
SELECT * FROM mytable WHERE a=X OR b=Y
This query is hard to optimize with simple B-tree indexing. Does the engine search an index on column a
? Or on column b
? Either way, searching the other term requires a table-scan.
Hence the trick of using UNION to separate into two queries for one term each. Each subquery can use the best index for each search term. Then combine the results using UNION.
But the two subsets may overlap, because some rows where b=Y
may also have a=X
in which case such rows occur in both subsets. Therefore you have to do duplicate elimination, or else see some rows twice in the final result.
SELECT * FROM mytable WHERE a=X
UNION DISTINCT
SELECT * FROM mytable WHERE b=Y
UNION DISTINCT
is expensive because typical implementations sort the rows to find duplicates. Just like if you use SELECT DISTINCT ...
.
We also have a perception that it's even more "wasted" work if the two subset of rows you are unioning have a lot of rows occurring in both subsets. It's a lot of rows to eliminate.
But there's no need to eliminate duplicates if you can guarantee that the two sets of rows are already distinct. That is, if you guarantee there is no overlap. If you can rely on that, then it would always be a no-op to eliminate duplicates, and therefore the query can skip that step, and therefore skip the costly sorting.
If you change the queries so that they are guaranteed to select non-overlapping subsets of rows, that's a win.
SELECT * FROM mytable WHERE a=X
UNION ALL
SELECT * FROM mytable WHERE b=Y AND a!=X
These two sets are guaranteed to have no overlap. If the first set has rows where a=X
and the second set has rows where a!=X
then there can be no row that is in both sets.
The second query therefore only catches some of the rows where b=Y
, but any row where a=X AND b=Y
is already included in the first set.
So the query achieves an optimized search for two OR
terms, without producing duplicates, and requiring no UNION DISTINCT
operation.
Fastest way to remove duplicates from UNION ALL (without using UNION) on PostgreSQL?
If there are many duplicates among fruits and / or veggies, but not so many between fruits and veggies (like the names in your example suggest), and since you have an index for both of them, emulating an index skip scan (a.k.a. loose index scan) will work wonders:
WITH RECURSIVE fruit AS (
(
SELECT fruit
FROM recipes
ORDER BY 1
LIMIT 1
)
UNION ALL
SELECT (SELECT fruit
FROM recipes
WHERE fruit > t.fruit
ORDER BY 1
LIMIT 1)
FROM fruit t
WHERE t.fruit IS NOT NULL
)
, veggie AS (
(
SELECT veggie
FROM recipes
ORDER BY 1
LIMIT 1
)
UNION ALL
SELECT (SELECT veggie
FROM recipes
WHERE veggie > t.veggie
ORDER BY 1
LIMIT 1)
FROM veggie t
WHERE t.veggie IS NOT NULL
)
SELECT DISTINCT healthy_food
FROM (
SELECT fruit AS healthy_food FROM fruit
UNION ALL
SELECT veggie AS healthy_food FROM veggie
) sub
WHERE healthy_food IS NOT NULL;
Just DISTINCT
instead of DISTINCT ON
(like you tried) in the outer SELECT
, since we are dealing with a single column.
See:
- SELECT DISTINCT is slower than expected on my table in PostgreSQL
- Optimize GROUP BY query to retrieve latest row per user
You might as well use UNION
instead of UNION ALL
+ DISTINCT
in the outer SELECT
. Only avoided that because you explicitly asked for it. But I don't see the point.
Remove duplicates after UNION in SQL
You can use the same process of removing duplicates as you have used for both tables.
It would look something like this:
WITH cteUnion AS
( SELECT *, ROW_NUMBER() OVER (PARTITION BY V1,V2,V3 ORDER BY V1) AS rn
FROM (
(WITH cte1 AS(
SELECT v1, v2, v3, v4,
row_number()over (PARTITION BY V1, V2 ORDER BY V1) rn
FROM T1)
SELECT V1, V2, V3, V4
FROM cte1 WHERE rn=1)
UNION
(WITH cte2 AS(
SELECT v1, v2, v3, v4,
row_number()over (PARTITION BY V1, V2 ORDER BY V1) rn
FROM T2)
SELECT V1, V2, V3, V4
FROM cte2 WHERE rn=1)
) as union
)
SELECT *
FROM cteUnion
WHERE rn = 1
or you can use DISTINCT
if you just want columns V1,V2,V3:
SELECT DISTINCT V1,V2,V3
FROM (
(WITH cte1 AS(
SELECT v1, v2, v3, v4,
row_number()over (PARTITION BY V1, V2 ORDER BY V1) rn
FROM T1)
SELECT V1, V2, V3, V4
FROM cte1 WHERE rn=1)
UNION
(WITH cte2 AS(
SELECT v1, v2, v3, v4,
row_number()over (PARTITION BY V1, V2 ORDER BY V1) rn
FROM T2)
SELECT V1, V2, V3, V4
FROM cte2 WHERE rn=1)
) as union
How to remove duplicate using union all
IF and ONLY IF you have to use a UNION ALL otherwise I would go with Handoko Chen's solution
Select Distinct *
From (
Select * From Employee1
Union All
Select * From Employee2
) A
How to remove duplicates out of the UNION but ignore one column
You can use row_number()
. If you want one row per first name (what your question implies), then:
select t.*
from (select t.*,
row_number() over (partition by firstname order by department) as seqnum
from MyTable
) t
where seqnum = 1;
This will choose "Accounting" over "Finance" for duplicates.
UNION not removing duplicates
Your resultset actually has no duplicates. Duplicates are rows where all columns have equal values, and no row in your resultset complies to that definition.
Presumably, you want aggregation in the outer query:
select column1, max(column2) column2
from (
select part as column1, null as column2 from soldparts where part like ?
union all
select part as column1, vin as column2 from vindata where part like ?
)
group by column1
order by column1 asc
Note that I modified your query to use bind parameters (?
); for the sake of security and efficiency, you should learn to use parameterized query rather than concatenating variables in the query string.
Union two selects removing duplicates base on some columns (not the full row)
try
select * from table1
union
select * from table2
where not exists(select 1 from table1
where table2.parent = table1.parent
and table2.type = table1.type)
UNION does not remove duplicates in result
You can try the below way - using aggregation on top of your query
select OmrådesNr,OmrådesBenämning,max(Antal) as Antal
from
(
SELECT Cams_Global.dbo.A960.OmrådesNr, Cams_Global.dbo.A960.OmrådesBenämning, 0 AS Antal
FROM A406 INNER JOIN
Cams_Global.dbo.A960 ON A406.OmrådesNr = Cams_Global.dbo.A960.OmrådesNr
WHERE (A406.Kund IN (5566321537))
UNION
SELECT A960_1.OmrådesNr, A960_1.OmrådesBenämning, COUNT(A806.Aordernr) AS Antal
FROM A806 INNER JOIN
A406 AS A406_1 ON A806.Ställeid = A406_1.Ställeid INNER JOIN
A400 ON A806.Objektid = A400.Objektid INNER JOIN
A402 ON A400.ObjGrupp = A402.Objgrupp INNER JOIN
Cams_Global.dbo.A957 ON A806.LevId = Cams_Global.dbo.A957.LevId RIGHT OUTER JOIN
Cams_Global.dbo.A960 AS A960_1 ON A406_1.OmrådesNr = A960_1.OmrådesNr
WHERE (A806.Beställning = 0) AND (Cams_Global.dbo.A957.LevIdGrupp IN (1001, 1000)) AND
(A806.ProtokollSparad = 0) AND (A406_1.Kund = 5566321537)
GROUP BY A960_1.OmrådesBenämning, A960_1.OmrådesNr
)A group by OmrådesNr,OmrådesBenämning
Remove duplicates from union in SQL Views (Toad)
UNION
eliminates duplicates between the two views across all columns. There are several reasons that could cause the duplicates on (ITEM, CODE)
:
either the records where this tuple of columns is duplicated have different values in the other columns; since these records are not true duplicates,
UNION
does not eliminate them. In this case, your solution with the CTE is the correct approach.or one of the views (maybe both), already contains duplicates (whether true or partial).
UNION
does not removes the duplicates (even if they are true duplicates), them since they do not spread across the two queries.
In both cases, as an optimization, you could possibly skip the intermediate view by deduplicating directly the result of the UNION
query. If there are true duplicates in the views, then you can simply use SELECT DISTINCT
on each of the UNION
ed queries. Else if you are dealing with partial duplicates, then you can wrap the entire UNION
queries in a CTE and use ROW_NUMBER()
to remove duplicates, as shown in your existing code.
A final thought: since you noticed that UNION
actually does not remove any record, you would better use UNION ALL
instead: this means less work for your RDBMS, and will make the queries that address the final view more efficient.
Related Topics
How to Add Multiple "Not Like '%%' in the Where Clause of SQLite3
How to Use Merge on Linked Servers
How to Copy Data from One Table to Another (Where Both Have Other Fields Too That Are Not in Common)
Differencebetween a Candidate Key and a Primary Key
Divide the Table Data Randomly Based on Percentages
Db2 - Returning the Top 5 of Each Category
Pass String Variable Without Quotes in Query Vba
How to Find the Last Modified Date, Modified User of an Stored Procedure in SQL Server 2008
Teradata, Reset When, Partition By, Order By
A Select Query Selecting a Select Statement
Replacing Sequence with Random Number
Derived Concepts - Database Design Considerations
Postgresql - Repeating Rows from Limit Offset
What Is the Easiest Way to Update an Image Field with the Content of a File
How to Store Ordered Items Which Often Change Position in Db