Delete Duplicate Records in SQL Server

Large SQL table delete all duplicates

The most efficient way is probably to keep one row per table in a new table. Then truncate the old table and re-insert the values:

select t.*  -- or select all columns except seqnum
into temp_table
from (select t.*,
row_number() over (partition by email_address, job_id order by job_id) as seqnum
from t
) t
where seqnum = 1;

-- optional if you remove seqnum above
-- alter table temp_table drop column seqnum;


-- back this up first!
truncate table t;

insert to t
select * from temp_table;

You could drop the original table and rename temp_table to the table name. If you do so, remember to create constraints, indexes, triggers, partitions and whatever else is on the original table.

This will not be super fast (you can see how long it takes to create the temporary table). However, it does not require the locking and logging from deleting the original table. As a bonus, the new table should be less fragmented than if you did deletes.

delete duplicate records with in

If you want to delete older duplicate values, you can use:

delete from foo
where foo.id < (select max(foo2.id)
from foo foo2
where foo2.a = foo.a and foo2.b = foo.b
);

Note that an index on (a, b, id) would help performance.

You can also phrase this as a join:

delete from foo
using (select a, b, max(id) as max_id
from foo
group by a, b
) ab
where foo.a = a.a and foo.b = ab.b and foo.id < ab.max_id;

How to simply delete duplicate records in SQL Server?

Use row_number. This assumes there is another column or columns that help identify which to keep.

;with r as (
select SourceGradeId,
row_number() over (partition by SourceGradeId order by ) rn
from A
)
delete r where rn>1

SQL Server - delete duplicate rows of a table that has many-to-many relationship

Maybe delete the duplicate rows first, like this:

DELETE 
A
FROM TABLEA A
INNER JOIN
(
SELECT *,
RANK() OVER(PARTITION BY name, surname
ORDER BY ID_A) rank
FROM TABLEA
) T ON A.ID_A = t.ID_A
WHERE rank > 1;

And then delete rows in your matrix table that no longer exist in Table A.

DELETE FROM TABLEB WHERE ID_A NOT IN(SELECT ID_A FROM TABLEA)

(Note the delete statement may be off syntax-wise as I am typing from phone!)

Delete duplicate records without using ROW_NUMBER() function

As the motivation for this question seems to be academic interest rather than practical use...

The table has no primary key but the undocumented pseudo column %%physloc%% can provide a substitute.

DELETE T1
FROM YourTable T1 WITH(TABLOCKX)
WHERE CAST(T1.%%physloc%% AS BIGINT)
NOT IN (SELECT MAX(CAST(%%physloc%% AS BIGINT))
FROM YourTable
GROUP BY Name, Salary)

In reality you should never use the above and just use row_number as it is more efficient and documented.

(Data Explorer Demo)

How to delete duplicate rows that are exactly the same in SQL Server

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1

How to delete duplicate records in SQL?

You can delete duplicates using i.e. ROW_NUMBER():

with duplicates as
(
select
*
,ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, age ORDER BY FirstName) AS number
from yourTable
)
delete
from duplicates
where number > 1

Each row where number is bigger than 1 is a duplicate.



Related Topics



Leave a reply



Submit