Delete all Duplicate Rows except for One in MySQL?
Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.
NB - You need to do this first on a test copy of your table!
When I did it, I found that unless I also included AND n1.id <> n2.id
, it deleted every row in the table.
If you want to keep the row with the lowest
id
value:DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
If you want to keep the row with the highest
id
value:DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
I used this method in MySQL 5.1
Not sure about other versions.
Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE
, please be advised that using INSERT
and DISTINCT
is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE
, it took more than 2 hours and yet didn't complete.
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;
Delete all duplicates except first one mysql
Assuming that the primary key of your table is id
, you could phrase this as a delete/join query, like:
delete tm
from trademark_merge tm
inner join (
select serial_number, min(id) id
from trademark_merge
group by serial_number
) tm1 on tm.serial_number = tm1.serial_number and tm.id > tm1.id
Delete all Duplicate Rows except for One in MySQL
You can adapt that SQL to your table:
DELETE u1 FROM oc_product u1, oc_product u2 WHERE u1.product_id < u2.product_id AND u1.sku = u2.sku;
MySQL delete duplicate records but keep latest
Imagine your table test
contains the following data:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2 and 6.
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
Update
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
How to remove all duplicate rows except one with latest date from MySQL table?
Try this:
DELETE FROM tableName
WHERE (a, b, c, d, dte) NOT IN (SELECT a, b, c, d, dte
FROM (SELECT a, b, c, d, MAX(dte) dte
FROM tableName GROUP BY a, b, c, d
) AS A );
Check this SQL FIDDLE DEMO
Mysql delete similar rows according to specific columns except the ones with highest id
- You can modify your sub-select query, to get maximum value of
id
for each duplication combination. - Now, while joining to the main table, simply put a condition that
id
value will not be equal to the maximumid
value. - You can now
Delete
from this result-set.
Try the following:
DELETE t
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id
Related Topics
Calculate a Running Total in SQL Server
How to Do an Update Statement With Join in SQL Server
Why Does Oracle 9I Treat an Empty String as Null
How to Implement One-To-One, One-To-Many and Many-To-Many Relationships While Designing Tables
How to Get Multiple Counts With One SQL Query
Function Vs. Stored Procedure in SQL Server
Subquery Using Exists 1 or Exists *
Are "From Table1 Left Join Table2" and "From Table2 Right Join Table1" Interchangeable
Update Statement With Inner Join on Oracle
Simple Way to Transpose Columns and Rows in Sql
Calculate a Running Total in MySQL
Best Way to Select Random Rows Postgresql
MySQL Error: Key Specification Without a Key Length
How to Update Only One Field Using Entity Framework