How to Delete Duplicate Rows with SQL

Delete Duplicate Rows in SQL

You can use a Common Table Expression to delete the duplicates:

WITH Cte AS(
    SELECT *,
        Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId 
                                ORDER BY ModifiedDateTIme DESC)
    FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;

This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

Which way is faster to delete duplicate rows in sql?

Not having a primary key for your table is a general bad idea. Here is one way you can delete duplicates, with the record retained per 23 columns is arbitrary:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, ..., col22, col23
                                 ORDER BY (SELECT NULL)) rn
    FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;

delete duplicate records with in

If you want to delete older duplicate values, you can use:

delete from foo
    where foo.id < (select max(foo2.id)
                    from foo foo2
                    where foo2.a = foo.a and foo2.b = foo.b
                   );

Note that an index on (a, b, id) would help performance.

You can also phrase this as a join:

delete from foo
    using (select a, b, max(id) as max_id
           from foo
           group by a, b
          ) ab
    where foo.a = a.a and foo.b = ab.b and foo.id < ab.max_id;

How to delete duplicate rows that are exactly the same in SQL Server

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as (
    select row_number() over(partition by name, age, gender order by (select null)) rn
    from people
)
delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as (
    select row_number() over(partition by name order by (select null)) rn
    from people
)
delete from cte where rn > 1

How to delete duplicate records in SQL?

You can delete duplicates using i.e. ROW_NUMBER():

with duplicates as
(
    select
        *
        ,ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, age ORDER BY FirstName) AS number
    from yourTable
)
delete 
from duplicates
where number > 1

Each row where number is bigger than 1 is a duplicate.