SQL -- Remove Duplicate Pairs

Removing Mirrored Pairs from SQL Join

Assuming you do not care which pair ends up sticking around (ben,will) vs (will, ben), then my preferred solution is to do the following:

DELETE p2 
FROM Pairs p1 
INNER JOIN Pairs p2 
    on p1.Name1 = p2.Name2 
    and p1.Name2 = p2.Name1 
    and p1.Interest = p2.Interest
    -- match only one of the two pairs
    and p1.Name1 > p1.Name2

By virtue of the fact that you would never have Name1 and Name2 equal, there must always be one pair where the first member is less than the second member. Using that relationship, we can delete the duplicate.

This is especially trivial if you have a surrogate key for the relationship, as then the requirement for Name1 and Name2 to be unequal goes away.

Edit: if you don't want to remove them from the table, but just from the results of a specific query, use the same pattern with SELECT rather than DELETE.

SQL Remove duplicate combination

If you have other columns and the pairs only appear once (in either direction):

select t.*
from t
where t.x1 <= t.x2
union all
select t.*
from t
where t.x1 > t.x2 and
      not exists (select 1 from t t2 where t2.x1 = t.x2 and t2.x2 = t.x1);

Postgresql remove duplicate reversed pairs

One method uses aggregation:

select origin, destination,
       (case when exists (select 1
                          from t t2
                          where t2.origin = t.destination and t2.destination = t.origin
                         )
             then 0 else 1
        end) as one_way
from t
where origin < destination
union all
select origin, destination, 1
from t
where origin > destination;

An alternative method uses window functions:

select origin, destination, (cnt = 1)::int as one_way
from (select t.*,
             count(*) over (partition by least(origin, destination), greatest(origin, destination)) as cnt
      from t
     ) t
where origin < destination or
      (origin > destination and cnt = 1);

How do I remove duplicate rows based on two Columns? (E.g. A Pair Value Set)

You probably want to leave data, because Cassandra likes Gabriel and Gabriel likes Cassandra are different actions. So I will suggest the following query:

WITH cte AS(SELECT hs.NAME Highschooler ,
                   hs.grade inGrade1 ,
                   hs2.NAME likes ,
                   hs2.grade inGrade2 ,
                   ROW_NUMBER() OVER (PARTITION BY CASE WHEN l.id1 < l.id2 THEN l.id1 
                                                        ELSE l.id2 END,
                                                   CASE WHEN l.id1 < l.id2 THEN l.id2 
                                                        ELSE l.id1 END 
                                      ORDER BY (SELECT NULL)) rn
             FROM  highschooler hs
             JOIN likes l ON hs.id = l.id1
             JOIN highschooler hs2 ON hs2.id = l.id2)
SELECT * FROM cte WHERE rn = 1

This is the demostration:

DECLARE @t TABLE ( id1 INT, id2 INT )

INSERT  INTO @t
VALUES  ( 1, 2 ),
        ( 2, 1 ),
        ( 1, 3 ),
        ( 5, 6 ),
        ( 6, 5 ),
        ( 7, 8 );
WITH cte AS(SELECT * ,
                   ROW_NUMBER() OVER (PARTITION BY CASE WHEN id1 < id2 THEN id1 
                                                        ELSE id2 END,
                                                   CASE WHEN id1 < id2 THEN id2 
                                                        ELSE id1 END 
                                      ORDER BY (SELECT NULL)) rn
            FROM @t)
SELECT * FROM cte WHERE rn = 1

Output:

Find and remove duplicate entries where values can be swapped between two columns

I'd use smth like

select  min(id),
        least (node_from_id, node_to_id) node_from_id,
        greatest(node_from_id, node_to_id) node_to_id
from    relationships
group   by         
        least (node_from_id, node_to_id) ,
        greatest(node_from_id, node_to_id)

How to remove reverse duplicates from a SQL query

If you don't care about the ordering in the final result set, you can do:

select distinct least(ip_src, ip_dst) as ip_src, greatest(ip_src, ip_dst) as ip_dst
from link ;

Note: this can result in pairs not in the original table.

If you do care about ordering:

select ip_src, ip_dst
from link l
where ip_src <= ip_dst
union all
select ip_src, ip_dst
from link l
where ip_src > ip_dst and
      not exists (select 1 from link l2 where l2.ip_src = l.ip_dst and l2.ip_dst = l.ip_src);

Note: This uses union all, so it does not remove duplicates. You can use union to remove duplicates.

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))