Delete Duplicate Records from a SQL Table Without a Primary Key

Delete duplicate records from a SQL table without a primary key

Add a Primary Key (code below)

Run the correct delete (code below)

Consider WHY you woudln't want to keep that primary key.


Assuming MSSQL or compatible:

ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY;

WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1)
BEGIN
DELETE FROM Employee WHERE EmployeeID IN
(
SELECT MIN(EmployeeID) as [DeleteID]
FROM Employee
GROUP BY EmpID, EmpSSN
HAVING COUNT(*) > 1
)
END

SQL Delete duplicate rows in the table without primary key on SQL Server

You just change your select to a delete, basically:

WITH tmp AS (
SELECT Code, ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Code) AS ROWNUMBER
FROM CouponCode
)
DELETE tmp
WHERE ROWNUMBER > 1;

Delete duplicate records from a Postgresql table without a primary key?

Copy distinct data to work table fk_payment1_copy. The simplest way to do that is to use into

SELECT max(id),settlement_ref_no ... 
INTO fk_payment1_copy
from fk_payment1
GROUP BY settlement_ref_no ...

delete all rows from fk_payment1

delete from fk_payment1

and copy data from fk_payment1_copy table to fk_payment1

insert into fk_payment1
select id,settlement_ref_no ...
from fk_payment1_copy

Remove duplicate entries without primary key in SQL

Use this query

DELETE a FROM (
SELECT row_number() over(partition by EntityNo order by EntityNo) as RowNo
FROM Entity
) AS a WHERE RowNo > 1

How do I delete all duplicate rows without a primary key?

The generic SQL approach is to store the data, truncate the table, and reinsert the data. The syntax varies a bit by database, but here is an example:

create table TempTable as
select distinct * from MyTable;

truncate table MyTable;

insert into MyTable
select * from TempTable;

There are other approaches that don't require a temporary table, but they are even more database-dependent.

How to delete duplicate rows without unique identifier

I like @erwin-brandstetter 's solution, but wanted to show a solution with the USING keyword:

DELETE   FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;

If you want to review the records before deleting them, then simply replace DELETE with SELECT * and USING with a comma ,, i.e.

SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;

Update: I tested some of the different solutions here for speed. If you don't expect many duplicates, then this solution performs much better than the ones that have a NOT IN (...) clause as those generate a lot of rows in the subquery.

If you rewrite the query to use IN (...) then it performs similarly to the solution presented here, but the SQL code becomes much less concise.

Update 2: If you have NULL values in one of the key columns (which you really shouldn't IMO), then you can use COALESCE() in the condition for that column, e.g.

  AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')

How to delete duplicate rows that are exactly the same in SQL Server

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1

How to delete a duplicate record without using primary key

You've got a couple of options here.

If they don't mind you dropping the table you could SELECT DISTINCT * from the table in question and then INSERT this into a new table, DROPping the old table as you go. This obviously won't be usable in a Production database but can be useful for where someone has mucked up a routine that's populating a data warehouse for example.

Alternatively you could effectively create a temporary index by using the row number as per this answer. That answer shows you how to use the built in row_number() function in SQL server but could be replicated in other RDBMS' (not sure which but MySQL certainly) by declaring a variable called @row_num or equivalent and then using it in your SELECT statement as:

SET @row_num=0;
SELECT @row_num:=@row_num+1 AS row_num, [REMAINING COLUMNS GO HERE]


Related Topics



Leave a reply



Submit