Delete Duplicate Rows in SQL
You can use a Common Table Expression
to delete the duplicates:
WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;
This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId
combination.
How can I remove duplicate rows?
Assuming no nulls, you GROUP BY
the unique columns, and SELECT
the MIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
In case you have a GUID instead of an integer, you can replace
MIN(RowId)
with
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
Which way is faster to delete duplicate rows in sql?
Not having a primary key for your table is a general bad idea. Here is one way you can delete duplicates, with the record retained per 23 columns is arbitrary:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, ..., col22, col23
ORDER BY (SELECT NULL)) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
delete duplicate records with in
If you want to delete older duplicate values, you can use:
delete from foo
where foo.id < (select max(foo2.id)
from foo foo2
where foo2.a = foo.a and foo2.b = foo.b
);
Note that an index on (a, b, id)
would help performance.
You can also phrase this as a join:
delete from foo
using (select a, b, max(id) as max_id
from foo
group by a, b
) ab
where foo.a = a.a and foo.b = ab.b and foo.id < ab.max_id;
How to delete duplicate rows that are exactly the same in SQL Server
You could use an updatable CTE for this.
If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):
with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1
If you want to delete duplicates on name
only (as shown in your existing query):
with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1
How to delete duplicate records in SQL?
You can delete duplicates using i.e. ROW_NUMBER()
:
with duplicates as
(
select
*
,ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, age ORDER BY FirstName) AS number
from yourTable
)
delete
from duplicates
where number > 1
Each row where number
is bigger than 1 is a duplicate.
Related Topics
SQL Server Convert Integer to Binary String
Group by Without Aggregate Function
Creating Temporary Tables in SQL
Get the Distinct Sum of a Joined Table Column
How to Pivot Rows into Columns (Custom Pivoting)
Performance Issue in Using Select *
Postgresql Delete with Inner Join
Can You Access the Auto Increment Value in MySQL Within One Statement
SQL Server Ignore Case in a Where Expression
Basic Recursive Query on SQLite3
Record Returned from Function Has Columns Concatenated
Using Tuples in SQL "In" Clause