Delete all Duplicate Rows except for One in MySQL?
Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.
NB - You need to do this first on a test copy of your table!
When I did it, I found that unless I also included AND n1.id <> n2.id
, it deleted every row in the table.
If you want to keep the row with the lowest
id
value:DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
If you want to keep the row with the highest
id
value:DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
I used this method in MySQL 5.1
Not sure about other versions.
Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE
, please be advised that using INSERT
and DISTINCT
is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE
, it took more than 2 hours and yet didn't complete.
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;
Remove duplicate rows in MySQL
A really easy way to do this is to add a UNIQUE
index on the 3 columns. When you write the ALTER
statement, include the IGNORE
keyword. Like so:
ALTER IGNORE TABLE jobs
ADD UNIQUE INDEX idx_name (site_id, title, company);
This will drop all the duplicate rows. As an added benefit, future INSERTs
that are duplicates will error out. As always, you may want to take a backup before running something like this...
MySQL delete duplicate records but keep latest
Imagine your table test
contains the following data:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2 and 6.
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
Update
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
How do I delete all the duplicate records in a MySQL table without temp tables
Add Unique Index on your table:
ALTER IGNORE TABLE `TableA`
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);
Another way to do this would be:
Add primary key in your table then you can easily remove duplicates from your table using the following query:
DELETE FROM member
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);
Find and Delete Duplicate rows in MySQL
A self join on store_id
, product_id
and 'is older' in combination with DISTINCT
should give you all rows where also a newer version exists:
> SHOW CREATE TABLE stock;
CREATE TABLE `stock` (
`stock_id` int(11) NOT NULL,
`product_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`stock_qty` int(11) DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`stock_id`)
> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at |
+----------+------------+----------+-----------+---------------------+
| 1 | 1 | 1 | 1 | 2001-01-01 12:00:00 |
| 2 | 2 | 2 | 1 | 2001-01-01 12:00:00 |
| 3 | 2 | 2 | 1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+
> SELECT DISTINCT s1.stock_id, s1.store_id, s1.product_id, s1.updated_at
FROM stock s1 JOIN stock s2
ON s1.store_id = s2.store_id
AND s1.product_id = s2.product_id
AND s1.updated_at < s2.updated_at;
+----------+----------+------------+---------------------+
| stock_id | store_id | product_id | updated_at |
+----------+----------+------------+---------------------+
| 2 | 2 | 2 | 2001-01-01 12:00:00 |
+----------+----------+------------+---------------------+
> DELETE stock FROM stock
JOIN stock s2 ON stock.store_id = s2.store_id
AND stock.product_id = s2.product_id
AND stock.updated_at < s2.updated_at;
Query OK, 1 row affected (0.02 sec)
> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at |
+----------+------------+----------+-----------+---------------------+
| 1 | 1 | 1 | 1 | 2001-01-01 12:00:00 |
| 3 | 2 | 2 | 1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+
Related Topics
Activerecord Arel or Condition
Count(*) VS Count(1) - SQL Server
Best Approach to Remove Time Part of Datetime in SQL Server
How to List All Foreign Keys Referencing a Given Table in SQL Server
Quick Selection of a Random Row from a Large Table in MySQL
Concatenate Columns in Apache Spark Dataframe
In MySQL Queries, Why Use Join Instead of Where
Get Value Based on Max of a Different Column Grouped by Another Column
MySQL - Conditional Foreign Key Constraints
Ruby Gem MySQL2 Install Failing
Get Records With Max Value For Each Group of Grouped SQL Results
How to Cast the Datetime to Time
Oracle: How to Upsert (Update or Insert into a Table)