Remove Duplicate Records Except the First Record in SQL

Remove duplicate records except the first record in SQL

Use a CTE (I have several of these in production).

;WITH duplicateRemoval as (
SELECT
[name]
,ROW_NUMBER() OVER(PARTITION BY [name] ORDER BY [name]) ranked
from #myTable
ORDER BY name
)
DELETE
FROM duplicateRemoval
WHERE ranked > 1;

Explanation: The CTE will grab all of your records and apply a row number for each unique entry. Each additional entry will get an incrementing number. Replace the DELETE with a SELECT * in order to see what it does.

T-SQL: Deleting all duplicate rows but keeping one

You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:

WITH cte AS (
SELECT[foo], [bar],
row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
FROM TABLE
)
DELETE cte WHERE [rn] > 1

Play around with it and see what you get.

(Edit: In an attempt to be helpful, someone edited the ORDER BY clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc)

Delete all Duplicate Rows except for One in MySQL?

Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
  2. If you want to keep the row with the highest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;

Delete all duplicates except first one mysql

Assuming that the primary key of your table is id, you could phrase this as a delete/join query, like:

delete tm
from trademark_merge tm
inner join (
select serial_number, min(id) id
from trademark_merge
group by serial_number
) tm1 on tm.serial_number = tm1.serial_number and tm.id > tm1.id

Deleting all duplicate records except the first record in a MySQL table

ALTER IGNORE TABLE is not longer available in MySQL 5.7 as it causes replication issues.

https://dev.mysql.com/worklog/task/?id=7395

I would suggest adding a numeric auto-increment PK field to table, to help with finding the first row. (Primary keys are critical for performance, so perhaps consider it long term too.)

Step 1:

ALTER TABLE tbl ADD pkCol INT(10) UNSIGNED NOT NULL auto_increment FIRST;

Step 2:

Write a subquery to find the first Primary Key, grouping by all the fields that would make the row duplicate. Then you can delete all rows in the outer query that join on the fields but don't match the first PK.

If the column that you are joining to may be NULL, you will need to wrap the column with an IFNULL as you cannot JOIN on NULL fields. Similarly, you cannot group on a NULL field in the GROUP BY clause, and need to wrap the column in IFNULL.

DELETE t1
FROM tbl t1
JOIN (
SELECT t2.*, MIN(pkCol) first_pkCol
FROM tbl t2
GROUP BY IFNULL(col1,0),col2,col3
) t2 ON (IFNULL(t1.col1,0) = IFNULL(t2.col1,0)
AND t1.col2 = t2.col2
AND t1.col3 = t2.col3)
WHERE t1.pkCol <> t2.first_pkCol;

How to remove all duplicate rows except one with latest date from MySQL table?

Try this:

DELETE FROM tableName 
WHERE (a, b, c, d, dte) NOT IN (SELECT a, b, c, d, dte
FROM (SELECT a, b, c, d, MAX(dte) dte
FROM tableName GROUP BY a, b, c, d
) AS A );

Check this SQL FIDDLE DEMO

Delete all but one duplicate record

ANSI SQL Solution

Use group by in a subquery:

delete from my_tab where id not in 
(select min(id) from my_tab group by profile_id, visitor_id);

You need some kind of unique identifier(here, I'm using id).

MySQL Solution

As pointed out by @JamesPoulson, this causes a syntax error in MySQL; the correct solution is (as shown in James' answer):

delete from `my_tab` where id not in
( SELECT * FROM
(select min(id) from `my_tab` group by profile_id, visitor_id) AS temp_tab
);


Related Topics



Leave a reply



Submit