How to Delete Duplicate Rows in SQL Server

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

Remove duplicate rows SQL Server?

A deletable CTE is on the right track. Here is one way:

WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY year_id, week_id, good_id, store_id, ship_id) cnt
FROM dbo.sales
)

DELETE
FROM cte
WHERE cnt = 2 AND quantity = 0;

This will delete every record being duplicate with regard to the five columns you mentioned and having a zero quantity. If you want to also cater for duplicates in greater than pairs, just change the restriction on cnt.

How to delete duplicate rows that are exactly the same in SQL Server

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1

SQL Server - delete duplicate rows of a table that has many-to-many relationship

Maybe delete the duplicate rows first, like this:

DELETE 
A
FROM TABLEA A
INNER JOIN
(
SELECT *,
RANK() OVER(PARTITION BY name, surname
ORDER BY ID_A) rank
FROM TABLEA
) T ON A.ID_A = t.ID_A
WHERE rank > 1;

And then delete rows in your matrix table that no longer exist in Table A.

DELETE FROM TABLEB WHERE ID_A NOT IN(SELECT ID_A FROM TABLEA)

(Note the delete statement may be off syntax-wise as I am typing from phone!)

How to delete duplicate records in SQL?

You can delete duplicates using i.e. ROW_NUMBER():

with duplicates as
(
select
*
,ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, age ORDER BY FirstName) AS number
from yourTable
)
delete
from duplicates
where number > 1

Each row where number is bigger than 1 is a duplicate.

Trying to delete duplicate rows in SQL Server where the difference is the date or batch number

You could use a deletable CTE here:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE
ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn
FROM mastertable
WHERE SOURCE_SYSTEM = 'LEGACY'
)

DELETE
FROM cte
WHERE rn > 1;

The logic is to assign a row number to each group of records having the same values for ID_NUMBER, INCEPTION_DATE, and OCCURRENCE. The first row number value of 1 will be assigned to the record having the earliest FILE_LOAD_DATE. In cases of two or more records tied for the earliest FILE_LOAD_DATE, the tie breaker will be determined by the earliest BATCH_NUM.

The delete statement removes all records except for this earliest record.

Delete Duplicate Rows in SQL

You can use a Common Table Expression to delete the duplicates:

WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;

This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.

Which way is faster to delete duplicate rows in sql?

Not having a primary key for your table is a general bad idea. Here is one way you can delete duplicates, with the record retained per 23 columns is arbitrary:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, ..., col22, col23
ORDER BY (SELECT NULL)) rn
FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;


Related Topics



Leave a reply



Submit