Remove reverse duplicates from an SQL query
First of all, welcome to 2012. We have migrated away from relating tables using commas. It was introdued in ANSI 89 but is severely lacking. Nowaways, the correct way is to write queries using the ANSI 92/99/2003 JOIN syntax.
The solution to your problem is to turn your bidirectional inequality <>
into a unidirectional inequality, either <
or >
whichever you prefer.
select e.column3, f.column3
from example as e
join example as f on e.column2 = f.column2 and e.column3 < f.column3
How to remove reverse duplicates from a SQL query
If you don't care about the ordering in the final result set, you can do:
select distinct least(ip_src, ip_dst) as ip_src, greatest(ip_src, ip_dst) as ip_dst
from link ;
Note: this can result in pairs not in the original table.
If you do care about ordering:
select ip_src, ip_dst
from link l
where ip_src <= ip_dst
union all
select ip_src, ip_dst
from link l
where ip_src > ip_dst and
not exists (select 1 from link l2 where l2.ip_src = l.ip_dst and l2.ip_dst = l.ip_src);
Note: This uses union all
, so it does not remove duplicates. You can use union
to remove duplicates.
Delete reverse duplicate rows using sql
One approach is to identify only the valid rows e.g.
SELECT column1, column2
FROM T
WHERE column1 <= column2
UNION
SELECT column2 AS column1, column1 AS column2
FROM T
WHERE column1 > column2;
...then delete rows that aren't in the set of valid rows:
DELETE
FROM T
WHERE NOT EXISTS (
SELECT *
FROM (
SELECT column1, column2
FROM T
WHERE column1 <= column2
UNION
SELECT column2 AS column1, column1 AS column2
FROM T
WHERE column1 > column2
) AS DT1
WHERE DT1.column1 = T.column1
AND DT1.column2 = T.column2
);
Alternatively, the DELETE
may be simplified to target only the invalid rows:
DELETE
FROM T
WHERE column1 > column2
AND EXISTS (
SELECT *
FROM T AS T1
WHERE T1.column1 = T.column2
AND T1.column2 = T.column1
);
How to drop reverse duplicates from a table?
from my merged table, how can I keep only unique records about source and destination? – user_12
DELETE t1.*
FROM merged_table t1
JOIN merged_table t2 ON t1.src = t2.dst
AND t1.dst = t2.src
AND t1.src > t2.src;
FIDDLE with some explanational queries.
Remove Duplicates based on two fields which can have reverse values in MySql
If your Couples table had an id column.
You could use this query to remove duplicates(considering that 'Bob','Alice'
is consider a duplicate of 'Alice','Bob'
)
DELETE FROM Couples
WHERE ID NOT IN (SELECT * FROM
(SELECT MIN(id) as min_id
FROM Couples
GROUP BY LEAST(person1,person2),
GREATEST(person1,person2)
)T
);
http://sqlfiddle.com/#!9/fe81f/1
What the subquery does is it gets a minimum id for each grouping by LEAST(person1,person2),GREATEST(person1,person2)..which means Bob,Alice
and Alice,Bob
is considered the same group.
Then the outer delete statement delete rows that are not in the list of minimum ids (the duplicated rows)... The extra SELECT * is just to trick Mysql into deleting from the same Table in its subquery.
UPDATE For millions of rows using IN
or NOT IN
might be causing speed issue. So try the below query it does the same but using joins instead of NOT IN
and see if it's any faster
DELETE C1
FROM Couples C1
INNER JOIN
(SELECT MIN(id) as min_id,
LEAST(person1,person2) as per1,
GREATEST(person1,person2) as per2
FROM Couples
GROUP BY per1,per2
)C2
ON C1.id > C2.min_id
AND LEAST(C1.person1,C1.person2) = C2.per1
AND GREATEST(C1.person1,C1.person2) = C2.per2
;
http://www.sqlfiddle.com/#!9/f243e/1
How to remove duplicate/inverted rows from INNER JOIN
Simply change:
a.value <> b.value
to:
a.value < b.value
I would put the condition in the ON
clause, like this:
SELECT a.name, a.value, a.num, b.name, b.value, b.num
FROM MY_TABLE a INNER JOIN
MY_TABLE b
ON a.name = b.name AND a.value < b.value;
Related Topics
Inner Join with Count() on Three Tables
Instead of Null How to Show '0' in Result with Select Statement SQL
Do Ddl Statements Always Give You an Implicit Commit, or Can You Get an Implicit Rollback
How to View All the Metadata of Columns of a Table in Oracle Database
Finding Rows with Consecutive Increase in the Values of a Column
Add Column to Table and Then Update It Inside Transaction
SQL Get the Last Date Time Record
Convert Timestamp to Date in Oracle SQL
Select Only Some Columns from a Table on a Join
Count of Non-Null Columns in Each Row
SQL Ignore Part of Where If Parameter Is Null
Return Rows from Insert with on Conflict Without Needing to Update
Is There Any Better Option to Apply Pagination Without Applying Offset in SQL Server
Rand Not Different for Every Row in T-SQL Update
Oracle Table Column Name with Space