Delete duplicate rows (don't delete all duplicate)
Try the steps described in this article: Removing duplicates from a PostgreSQL database.
It describes a situation when you have to deal with huge amount of data which isn't possible to group by
.
A simple solution would be this:
DELETE FROM foo
WHERE id NOT IN (SELECT min(id) --or max(id)
FROM foo
GROUP BY hash)
Where hash
is something that gets duplicated.
How to delete duplicate rows that are exactly the same in SQL Server
You could use an updatable CTE for this.
If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):
with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1
If you want to delete duplicates on name
only (as shown in your existing query):
with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1
How to delete duplicate rows without unique identifier
I like @erwin-brandstetter 's solution, but wanted to show a solution with the USING
keyword:
DELETE FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
If you want to review the records before deleting them, then simply replace DELETE
with SELECT *
and USING
with a comma ,
, i.e.
SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
Update: I tested some of the different solutions here for speed. If you don't expect many duplicates, then this solution performs much better than the ones that have a NOT IN (...)
clause as those generate a lot of rows in the subquery.
If you rewrite the query to use IN (...)
then it performs similarly to the solution presented here, but the SQL code becomes much less concise.
Update 2: If you have NULL
values in one of the key columns (which you really shouldn't IMO), then you can use COALESCE()
in the condition for that column, e.g.
AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')
T-SQL: Deleting all duplicate rows but keeping one
You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:
WITH cte AS (
SELECT[foo], [bar],
row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
FROM TABLE
)
DELETE cte WHERE [rn] > 1
Play around with it and see what you get.
(Edit: In an attempt to be helpful, someone edited the ORDER BY
clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc
)
Removing duplicate rows from table in Oracle
Use the rowid
pseudocolumn.
DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);
Where column1
, column2
, and column3
make up the identifying key for each record. You might list all your columns.
How can I remove duplicate rows?
Assuming no nulls, you GROUP BY
the unique columns, and SELECT
the MIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
In case you have a GUID instead of an integer, you can replace
MIN(RowId)
with
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
How do I delete all duplicate rows without a primary key?
The generic SQL approach is to store the data, truncate the table, and reinsert the data. The syntax varies a bit by database, but here is an example:
create table TempTable as
select distinct * from MyTable;
truncate table MyTable;
insert into MyTable
select * from TempTable;
There are other approaches that don't require a temporary table, but they are even more database-dependent.
How to delete rows that have duplicate column combination
You can try to use delete JOIN
DELETE t1
FROM [Table] t1
INNER JOIN (
SELECT Column1,
Column2,
max(RefDate) as MaxDate
FROM [Table]
GROUP BY Column1, Column2
) t2
ON t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
AND t1.RefDate <> t2.MaxDate
or use EXISTS
subquery.
DELETE t1
FROM [Table] t1
WHERE EXISTS (
SELECT 1
FROM [Table] t2
WHERE t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
HAVING max(t2.RefDate) <> t1.RefDate
)
sqlfiddle
Related Topics
Does an Empty SQL Table Have a Superkey? Does Every SQL Table Have One
Exporting Data in SQL Server as Insert Into
When to Use Common Table Expression (Cte)
Single Fixed Table with Multiple Columns VS Flexible Abstract Tables
SQL Server: Extract Table Meta-Data (Description, Fields and Their Data Types)
Deleting Duplicate Rows from SQLite Database
Return 0 If Field Is Null in MySQL
How to Use Oracle Order by and Rownum Correctly
Group by Month and Year in MySQL
Delete Duplicate Records from a SQL Table Without a Primary Key
What's the Best Way to Select the Minimum Value from Several Columns
Calculating Number of Full Months Between Two Dates in SQL
T-SQL Split String Based on Delimiter
SQL Server:Transpose Rows to Columns
Convert Excel Date Serial Number to Regular Date