Delete Duplicate Rows (Don't Delete All Duplicate)

Delete duplicate rows (don't delete all duplicate)

Try the steps described in this article: Removing duplicates from a PostgreSQL database.

It describes a situation when you have to deal with huge amount of data which isn't possible to group by.

A simple solution would be this:

DELETE FROM foo
WHERE id NOT IN (SELECT min(id) --or max(id)
FROM foo
GROUP BY hash)

Where hash is something that gets duplicated.

How to delete duplicate rows that are exactly the same in SQL Server

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1

How to delete duplicate rows without unique identifier

I like @erwin-brandstetter 's solution, but wanted to show a solution with the USING keyword:

DELETE   FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;

If you want to review the records before deleting them, then simply replace DELETE with SELECT * and USING with a comma ,, i.e.

SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;

Update: I tested some of the different solutions here for speed. If you don't expect many duplicates, then this solution performs much better than the ones that have a NOT IN (...) clause as those generate a lot of rows in the subquery.

If you rewrite the query to use IN (...) then it performs similarly to the solution presented here, but the SQL code becomes much less concise.

Update 2: If you have NULL values in one of the key columns (which you really shouldn't IMO), then you can use COALESCE() in the condition for that column, e.g.

  AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')

T-SQL: Deleting all duplicate rows but keeping one

You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:

WITH cte AS (
SELECT[foo], [bar],
row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
FROM TABLE
)
DELETE cte WHERE [rn] > 1

Play around with it and see what you get.

(Edit: In an attempt to be helpful, someone edited the ORDER BY clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc)

Removing duplicate rows from table in Oracle

Use the rowid pseudocolumn.

DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);

Where column1, column2, and column3 make up the identifying key for each record. You might list all your columns.

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

How do I delete all duplicate rows without a primary key?

The generic SQL approach is to store the data, truncate the table, and reinsert the data. The syntax varies a bit by database, but here is an example:

create table TempTable as
select distinct * from MyTable;

truncate table MyTable;

insert into MyTable
select * from TempTable;

There are other approaches that don't require a temporary table, but they are even more database-dependent.

How to delete rows that have duplicate column combination

You can try to use delete JOIN

DELETE t1
FROM [Table] t1
INNER JOIN (
SELECT Column1,
Column2,
max(RefDate) as MaxDate
FROM [Table]
GROUP BY Column1, Column2
) t2
ON t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
AND t1.RefDate <> t2.MaxDate

or use EXISTS subquery.

DELETE t1
FROM [Table] t1
WHERE EXISTS (
SELECT 1
FROM [Table] t2
WHERE t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
HAVING max(t2.RefDate) <> t1.RefDate
)

sqlfiddle



Related Topics



Leave a reply



Submit