SQL -- Remove Duplicate Pairs

Removing Mirrored Pairs from SQL Join

Assuming you do not care which pair ends up sticking around (ben,will) vs (will, ben), then my preferred solution is to do the following:

DELETE p2 
FROM Pairs p1
INNER JOIN Pairs p2
on p1.Name1 = p2.Name2
and p1.Name2 = p2.Name1
and p1.Interest = p2.Interest
-- match only one of the two pairs
and p1.Name1 > p1.Name2

By virtue of the fact that you would never have Name1 and Name2 equal, there must always be one pair where the first member is less than the second member. Using that relationship, we can delete the duplicate.

This is especially trivial if you have a surrogate key for the relationship, as then the requirement for Name1 and Name2 to be unequal goes away.

Edit: if you don't want to remove them from the table, but just from the results of a specific query, use the same pattern with SELECT rather than DELETE.

SQL Remove duplicate combination

If you have other columns and the pairs only appear once (in either direction):

select t.*
from t
where t.x1 <= t.x2
union all
select t.*
from t
where t.x1 > t.x2 and
not exists (select 1 from t t2 where t2.x1 = t.x2 and t2.x2 = t.x1);

Postgresql remove duplicate reversed pairs

One method uses aggregation:

select origin, destination,
(case when exists (select 1
from t t2
where t2.origin = t.destination and t2.destination = t.origin
)
then 0 else 1
end) as one_way
from t
where origin < destination
union all
select origin, destination, 1
from t
where origin > destination;

An alternative method uses window functions:

select origin, destination, (cnt = 1)::int as one_way
from (select t.*,
count(*) over (partition by least(origin, destination), greatest(origin, destination)) as cnt
from t
) t
where origin < destination or
(origin > destination and cnt = 1);

How do I remove duplicate rows based on two Columns? (E.g. A Pair Value Set)

You probably want to leave data, because Cassandra likes Gabriel and Gabriel likes Cassandra are different actions. So I will suggest the following query:

WITH cte AS(SELECT hs.NAME Highschooler ,
hs.grade inGrade1 ,
hs2.NAME likes ,
hs2.grade inGrade2 ,
ROW_NUMBER() OVER (PARTITION BY CASE WHEN l.id1 < l.id2 THEN l.id1
ELSE l.id2 END,
CASE WHEN l.id1 < l.id2 THEN l.id2
ELSE l.id1 END
ORDER BY (SELECT NULL)) rn
FROM highschooler hs
JOIN likes l ON hs.id = l.id1
JOIN highschooler hs2 ON hs2.id = l.id2)
SELECT * FROM cte WHERE rn = 1

This is the demostration:

DECLARE @t TABLE ( id1 INT, id2 INT )

INSERT INTO @t
VALUES ( 1, 2 ),
( 2, 1 ),
( 1, 3 ),
( 5, 6 ),
( 6, 5 ),
( 7, 8 );
WITH cte AS(SELECT * ,
ROW_NUMBER() OVER (PARTITION BY CASE WHEN id1 < id2 THEN id1
ELSE id2 END,
CASE WHEN id1 < id2 THEN id2
ELSE id1 END
ORDER BY (SELECT NULL)) rn
FROM @t)
SELECT * FROM cte WHERE rn = 1

Output:

id1 id2 rn
1 2 1
1 3 1
5 6 1
7 8 1

Find and remove duplicate entries where values can be swapped between two columns

I'd use smth like

select  min(id),
least (node_from_id, node_to_id) node_from_id,
greatest(node_from_id, node_to_id) node_to_id
from relationships
group by
least (node_from_id, node_to_id) ,
greatest(node_from_id, node_to_id)

How to remove reverse duplicates from a SQL query

If you don't care about the ordering in the final result set, you can do:

select distinct least(ip_src, ip_dst) as ip_src, greatest(ip_src, ip_dst) as ip_dst
from link ;

Note: this can result in pairs not in the original table.

If you do care about ordering:

select ip_src, ip_dst
from link l
where ip_src <= ip_dst
union all
select ip_src, ip_dst
from link l
where ip_src > ip_dst and
not exists (select 1 from link l2 where l2.ip_src = l.ip_dst and l2.ip_dst = l.ip_src);

Note: This uses union all, so it does not remove duplicates. You can use union to remove duplicates.

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))


Related Topics



Leave a reply



Submit