Delete duplicate records from a SQL table without a primary key
Add a Primary Key (code below)
Run the correct delete (code below)
Consider WHY you woudln't want to keep that primary key.
Assuming MSSQL or compatible:
ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY;
WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1)
BEGIN
DELETE FROM Employee WHERE EmployeeID IN
(
SELECT MIN(EmployeeID) as [DeleteID]
FROM Employee
GROUP BY EmpID, EmpSSN
HAVING COUNT(*) > 1
)
END
SQL Delete duplicate rows in the table without primary key on SQL Server
You just change your select
to a delete
, basically:
WITH tmp AS (
SELECT Code, ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Code) AS ROWNUMBER
FROM CouponCode
)
DELETE tmp
WHERE ROWNUMBER > 1;
Delete duplicate records from a Postgresql table without a primary key?
Copy distinct data to work table fk_payment1_copy
. The simplest way to do that is to use into
SELECT max(id),settlement_ref_no ...
INTO fk_payment1_copy
from fk_payment1
GROUP BY settlement_ref_no ...
delete all rows from fk_payment1
delete from fk_payment1
and copy data from fk_payment1_copy
table to fk_payment1
insert into fk_payment1
select id,settlement_ref_no ...
from fk_payment1_copy
Remove duplicate entries without primary key in SQL
Use this query
DELETE a FROM (
SELECT row_number() over(partition by EntityNo order by EntityNo) as RowNo
FROM Entity
) AS a WHERE RowNo > 1
How do I delete all duplicate rows without a primary key?
The generic SQL approach is to store the data, truncate the table, and reinsert the data. The syntax varies a bit by database, but here is an example:
create table TempTable as
select distinct * from MyTable;
truncate table MyTable;
insert into MyTable
select * from TempTable;
There are other approaches that don't require a temporary table, but they are even more database-dependent.
How to delete duplicate rows without unique identifier
I like @erwin-brandstetter 's solution, but wanted to show a solution with the USING
keyword:
DELETE FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
If you want to review the records before deleting them, then simply replace DELETE
with SELECT *
and USING
with a comma ,
, i.e.
SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
Update: I tested some of the different solutions here for speed. If you don't expect many duplicates, then this solution performs much better than the ones that have a NOT IN (...)
clause as those generate a lot of rows in the subquery.
If you rewrite the query to use IN (...)
then it performs similarly to the solution presented here, but the SQL code becomes much less concise.
Update 2: If you have NULL
values in one of the key columns (which you really shouldn't IMO), then you can use COALESCE()
in the condition for that column, e.g.
AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')
How to delete duplicate rows that are exactly the same in SQL Server
You could use an updatable CTE for this.
If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):
with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1
If you want to delete duplicates on name
only (as shown in your existing query):
with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1
How to delete a duplicate record without using primary key
You've got a couple of options here.
If they don't mind you dropping the table you could SELECT DISTINCT *
from the table in question and then INSERT
this into a new table, DROP
ping the old table as you go. This obviously won't be usable in a Production database but can be useful for where someone has mucked up a routine that's populating a data warehouse for example.
Alternatively you could effectively create a temporary index by using the row number as per this answer. That answer shows you how to use the built in row_number()
function in SQL server but could be replicated in other RDBMS' (not sure which but MySQL certainly) by declaring a variable called @row_num
or equivalent and then using it in your SELECT
statement as:
SET @row_num=0;
SELECT @row_num:=@row_num+1 AS row_num, [REMAINING COLUMNS GO HERE]
Related Topics
"Case" Statement Within "Where" Clause in SQL Server 2008
SQL Query - Using Order by in Union
How to Check If a Column Is Empty or Null in MySQL
SQL Server - in Clause With a Declared Variable
How to Get the Difference in Years from Two Different Dates
Parse Comma-Separated String to Make in List of Strings in the Where Clause
SQL - If Exists Update Else Insert Into
Pass Multiple Values in Single Parameter
Using Stored Procedure in Classical Asp .. Execute and Get Results
Entity Framework VS Linq to SQL VS Ado.Net With Stored Procedures
How to Find Duplicate Values in a Table in Oracle
Connect by Prior Equivalent For MySQL
Normalize Array Subscripts So They Start With 1
SQL Switch/Case in 'Where' Clause