MySQL Remove Duplicates from Big Database Quick

MySQL remove duplicates from big database quick

I believe this will do it, using on duplicate key + ifnull():

create table tmp like yourtable;

alter table tmp add unique (text1, text2);

insert into tmp select * from yourtable
on duplicate key update text3=ifnull(text3, values(text3));

rename table yourtable to deleteme, tmp to yourtable;

drop table deleteme;

Should be much faster than anything that requires group by or distinct or a subquery, or even order by. This doesn't even require a filesort, which is going to kill performance on a large temporary table. Will still require a full scan over the original table, but there's no avoiding that.

MySQL remove duplicates from big database quick

I believe this will do it, using on duplicate key + ifnull():

create table tmp like yourtable;

alter table tmp add unique (text1, text2);

insert into tmp select * from yourtable
on duplicate key update text3=ifnull(text3, values(text3));

rename table yourtable to deleteme, tmp to yourtable;

drop table deleteme;

Should be much faster than anything that requires group by or distinct or a subquery, or even order by. This doesn't even require a filesort, which is going to kill performance on a large temporary table. Will still require a full scan over the original table, but there's no avoiding that.

Remove duplicates in large MySql table

This will populate NEW_TABLE with unique values, and the id value is the first id of the bunch:

INSERT INTO NEW_TABLE
SELECT MIN(ot.id),
ot.city,
ot.post_code,
ot.short_ccode
FROM OLD_TABLE ot
GROUP BY ot.city, ot.post_code, ot.short_ccode

If you want the highest id value per bunch:

INSERT INTO NEW_TABLE
SELECT MAX(ot.id),
ot.city,
ot.post_code,
ot.short_ccode
FROM OLD_TABLE ot
GROUP BY ot.city, ot.post_code, ot.short_ccode

How to remove duplicate items in MySQL with a dataset of 20 million rows?

You may try this:

ALTER IGNORE TABLE my_tablename ADD UNIQUE INDEX idx_name (text1 , text2);

ie, try to add UNIQUE INDEX to your columns and alter the table

This has an advantage that in future also there will be no duplicate rows which you can insert in your table

Deleting duplicates from a large table

I think you can use this query to delete the duplicate records from the table

ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)

Before doing this, just test with some sample data first..and then Try this....

Note: On version 5.5, it works on MyISAM but not InnoDB.

Removing duplicates MySQL database


ALTER IGNORE TABLE `table_name` ADD UNIQUE (`hash`)

Remove duplicate rows in MySQL

A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:

ALTER IGNORE TABLE jobs
ADD UNIQUE INDEX idx_name (site_id, title, company);

This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...



Related Topics



Leave a reply



Submit