Remove Reverse Duplicates from an SQL Query

Remove reverse duplicates from an SQL query

First of all, welcome to 2012. We have migrated away from relating tables using commas. It was introdued in ANSI 89 but is severely lacking. Nowaways, the correct way is to write queries using the ANSI 92/99/2003 JOIN syntax.

The solution to your problem is to turn your bidirectional inequality <> into a unidirectional inequality, either < or > whichever you prefer.

select e.column3, f.column3
from example as e
join example as f on e.column2 = f.column2 and e.column3 < f.column3

How to remove reverse duplicates from a SQL query

If you don't care about the ordering in the final result set, you can do:

select distinct least(ip_src, ip_dst) as ip_src, greatest(ip_src, ip_dst) as ip_dst
from link ;

Note: this can result in pairs not in the original table.

If you do care about ordering:

select ip_src, ip_dst
from link l
where ip_src <= ip_dst
union all
select ip_src, ip_dst
from link l
where ip_src > ip_dst and
      not exists (select 1 from link l2 where l2.ip_src = l.ip_dst and l2.ip_dst = l.ip_src);

Note: This uses union all, so it does not remove duplicates. You can use union to remove duplicates.

Delete reverse duplicate rows using sql

One approach is to identify only the valid rows e.g.

SELECT column1, column2
  FROM T
 WHERE column1 <= column2
UNION
SELECT column2 AS column1, column1 AS column2
  FROM T
 WHERE column1 > column2;

...then delete rows that aren't in the set of valid rows:

DELETE 
  FROM T
 WHERE NOT EXISTS (
                   SELECT *
                     FROM (
                           SELECT column1, column2
                             FROM T
                            WHERE column1 <= column2
                           UNION
                           SELECT column2 AS column1, column1 AS column2
                             FROM T
                            WHERE column1 > column2
                          ) AS DT1
                    WHERE DT1.column1 = T.column1 
                          AND DT1.column2 = T.column2
                  );

Alternatively, the DELETE may be simplified to target only the invalid rows:

DELETE 
  FROM T
 WHERE column1 > column2
       AND EXISTS (
                   SELECT *
                     FROM T AS T1
                    WHERE T1.column1 = T.column2 
                          AND T1.column2 = T.column1
                  );

How to drop reverse duplicates from a table?

from my merged table, how can I keep only unique records about source and destination? – user_12

DELETE t1.*
FROM merged_table t1
JOIN merged_table t2 ON t1.src = t2.dst
                    AND t1.dst = t2.src
                    AND t1.src > t2.src;

FIDDLE with some explanational queries.

Remove Duplicates based on two fields which can have reverse values in MySql

If your Couples table had an id column.
You could use this query to remove duplicates(considering that 'Bob','Alice' is consider a duplicate of 'Alice','Bob')

DELETE FROM Couples
WHERE ID NOT IN (SELECT * FROM 
                   (SELECT MIN(id) as min_id
                    FROM Couples
                    GROUP BY LEAST(person1,person2),
                          GREATEST(person1,person2)
                   )T
                );

http://sqlfiddle.com/#!9/fe81f/1
What the subquery does is it gets a minimum id for each grouping by LEAST(person1,person2),GREATEST(person1,person2)..which means Bob,Alice and Alice,Bob is considered the same group.
Then the outer delete statement delete rows that are not in the list of minimum ids (the duplicated rows)... The extra SELECT * is just to trick Mysql into deleting from the same Table in its subquery.

UPDATE For millions of rows using IN or NOT IN might be causing speed issue. So try the below query it does the same but using joins instead of NOT IN and see if it's any faster

DELETE C1
FROM Couples C1
INNER JOIN
     (SELECT MIN(id) as min_id,
              LEAST(person1,person2) as per1,
              GREATEST(person1,person2) as per2
      FROM Couples
      GROUP BY per1,per2
      )C2
ON C1.id > C2.min_id
AND LEAST(C1.person1,C1.person2) = C2.per1
AND GREATEST(C1.person1,C1.person2) = C2.per2
;

http://www.sqlfiddle.com/#!9/f243e/1

How to remove duplicate/inverted rows from INNER JOIN

Simply change:

a.value <> b.value

to:

a.value < b.value

I would put the condition in the ON clause, like this:

SELECT a.name, a.value, a.num, b.name, b.value, b.num
FROM MY_TABLE a INNER JOIN
     MY_TABLE b
     ON a.name = b.name AND a.value < b.value;

Remove Reverse Duplicates from an SQL Query