Delete All Duplicate Rows Except For One in MySQL

Delete all Duplicate Rows except for One in MySQL?

Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
  2. If you want to keep the row with the highest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here

Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;

Delete all duplicates except first one mysql

Assuming that the primary key of your table is id, you could phrase this as a delete/join query, like:

delete tm
from trademark_merge tm
inner join (
select serial_number, min(id) id
from trademark_merge
group by serial_number
) tm1 on tm.serial_number = tm1.serial_number and tm.id > tm1.id

Delete all Duplicate Rows except for One in MySQL

You can adapt that SQL to your table:

DELETE u1 FROM oc_product u1, oc_product u2 WHERE u1.product_id < u2.product_id AND u1.sku = u2.sku;

MySQL delete duplicate records but keep latest

Imagine your table test contains the following data:

  select id, email
from test;

ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee

So, we need to find all repeated emails and delete all of them, but the latest id.

In this case, aaa, bbb and eee are repeated, so we want to delete IDs 1, 7, 2 and 6.

To accomplish this, first we need to find all the repeated emails:

      select email 
from test
group by email
having count(*) > 1;

EMAIL
--------------------
aaa
bbb
eee

Then, from this dataset, we need to find the latest id for each one of these repeated emails:

  select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;

LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee

Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:

delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;

I don't have mySql installed on this machine right now, but should work

Update

The above delete works, but I found a more optimized version:

 delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;

You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:

select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+

Another version, is the delete provived by Rene Limon

delete from test
where id not in (
select max(id)
from test
group by email)

How to remove all duplicate rows except one with latest date from MySQL table?

Try this:

DELETE FROM tableName 
WHERE (a, b, c, d, dte) NOT IN (SELECT a, b, c, d, dte
FROM (SELECT a, b, c, d, MAX(dte) dte
FROM tableName GROUP BY a, b, c, d
) AS A );

Check this SQL FIDDLE DEMO

Mysql delete similar rows according to specific columns except the ones with highest id

  • You can modify your sub-select query, to get maximum value of id for each duplication combination.
  • Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
  • You can now Delete from this result-set.

Try the following:

DELETE t 
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id


Related Topics



Leave a reply



Submit