Remove Duplicates Using Only a MySQL Query

How to remove duplicate MySQL records (but only leave one)

You can use this to keep the row with the lowest id value

DELETE e1 FROM contacts e1, contacts e2 WHERE e1.id > e2.id AND e1.email = e2.email;

this an example link link 1

or you can change > to < for keep the highest id

DELETE e1 FROM contacts e1, contacts e2 WHERE e1.id < e2.id AND e1.email = e2.email;

this an example link link 2

MySQL use DELETE FROM to remove duplicates rows

First, this is a very bad way of implementing this code. But I guess you get what you pay for.

Second, simply run the query as a select:

SELECT p1.*, p2.*
FROM Person p1 JOIN
Person p2
ON p1.Email = p2.Email AND p1.Id > p2.Id;

(Note that I've rewritten the logic as a JOIN. You should always use proper, explicit, standard, readable JOIN syntax, but the two methods are functionally equivalent.)

On your second example, the results of this query are:

table1 email     table1 id    table2 id
john@example.com. 2 1
john@example.com. 3 1
john@example.com. 3 2

What is notable is that id = 1 is never in the second column -- and that is the column that determines which ids are deleted. In other words, all but the smallest id for each email get deleted because there is a smaller id.

This also hints at why this is a really bad solution. MySQL has to deal with two rows for id = 3. Perhaps it attempts to delete both. Perhaps it has to just deal with extra data. Either way, there is extra work. And the more rows with the same email in the data the more extra duplicates are created.

An alternative method, such as:

delete p
from person p join
(select email, min(id) as min_id
from person p2
group by email
) p2
on p.email = p2.email and p.id > p2.min_id;

Does not have this problem and, in my opinion, the intent is clearer.

Delete all Duplicate Rows except for One in MySQL?


Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
  2. If you want to keep the row with the highest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here

Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;

How to delete duplicates from one table, but keeping only one record?

I found out the exact reason of issue I faced finally.
I referenced the comment of @Malakiyasanjay.
you can find that from here How to keep only one row of a table, removing duplicate rows?

I tried like this: (and it worked for me as well but it took a lot of time to run the query for 30,000 rows)

delete from myTable
where id not in
(select min(id) as min from (select * from myTable) as x group by title)

The problem was I couldn't specify the 'myTable' table as a target table. so I used (select * from myTable) as x and figured it out.

I am sorry I can't explain more detail about that because I am not familiar with mysql query. But you should note that:

MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one.
(Please reference @forpas 's answer.)

But you have to notice this takes so long time... It might cause the time out error. I ran this query for table with about 600,000 rows but it didn't response for several days. So I conclude this idea is pretty fit to small database table.

I hope this is helpful for everyone! :)

mysql removing duplicates with where clause

Do you need in

DELETE t1
FROM dmf_product_match_unmatches t1
JOIN dmf_product_match_unmatches t2 USING (hid, flag)
WHERE flag = 1
AND t1.id < t2.id;

?

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a5e9e95335573ebedd45cdcd577b5602

How to Remove Duplicate with precedence of a particular field in mysql select query?

I suggest using select select max(status) as 'Status', that could do the job of getting first the status 'Success'. Then we can use group by job_logid,job_name to preserve the duplicates from the one that doesnt have an error and avoid to select the ones that have an error after o before a Success status.

How to remove duplicate rows from the output of multiple joins of MYSQL query?

You could add group by to have unique rows.

CREATE TABLE client(
cid INT,
client_name varchar(10) );

insert into client values
(1,'SAM'),
(2,'JOE'),
(1,'SAM');

CREATE TABLE purchase (
purchase_id INT,
cid INT,
product_id int );

insert into purchase values
(1,1,1),
(2,1,2),
(3,2,1);

CREATE TABLE product (
product_id INT,
product_name varchar(10) );

insert into product values
(1,'JAM'),
(2,'BREAD'),
(1,'JAM');


SELECT C.Cid, C.client_name, Pr.product_id, Pr.product_name
FROM client C
JOIN purchase Pu ON C.cid = Pu.cid
JOIN product Pr ON Pu.product_id = Pr.product_id
group by C.Cid, C.client_name, Pr.product_id, Pr.product_name;

Result:

cid client_name product_id  product_name
2 JOE 1 JAM
1 SAM 1 JAM
1 SAM 2 BREAD

I added some duplicates values in the Demo to tell the difference.
Demo



Related Topics



Leave a reply



Submit