Sql: Remove Duplicates

How to remove duplicates based on a certain column in SQL Server?

We can use a deletable CTE along with ROW_NUMBER here:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY fid ORDER BY date DESC, name) rn
FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;

The above logic will assign rn = 1 (i.e. spare) the record with the most recent date, per group of fid records. Should two records with the same fid also have the same latest date, then it spares the earlier name.

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

SQL query to remove duplicates from single column based on latest date

I have tried a few partitioning sql queries and also CTE but not able to get the desired result

Using QUALIFY it could be achieved without cte:

SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY COLUMN1 ORDER BY COLUMN2 DESC) = 1

remove duplicate data from sql

You can do the following..

Creating new table and keeping random row :

  1. first copy table disk(unique data) to temp table disk2.

  2. drop table disk.

  3. rename temp table disk2 to disk.

    create table disk2 select * from disk group by d;

    drop table disk;

    rename table disk2 to disk;

NOTE : Here we using group by with * because OP does not care which row to keep.


Creating new table and keeping row with min or max id :
Another way to do this while keeping row with min or max id

/*copy data from disk to temp table disk2*/
create table disk2 select * from disk
where id in (select min(id) from disk group by d);
/*drop table disk*/
drop table disk;
/*rename temp table to disk*/
rename table disk2 to disk;


UPDATE: Another way to do this

Deleting duplicates from existing table

    /*first create a dups table for duplicates*/
create table dups select * from disk
where id not in (select min(id) from disk group by d);
/*now delete all rows which are present in dups table*/
delete from disk where id in (select id from dups);
/*now delete the dups table*/
drop table dups;

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.

I had to do something similar a few days ago so maybe you can modify my query for your purposes:

WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1

If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2

SQL How to remove duplicates within select query?

You mention that there are date duplicates, but it appears they're quite unique down to the precision of seconds.

Can you clarify what precision of date you start considering dates duplicate - day, hour, minute?

In any case, you'll probably want to floor your datetime field. You didn't indicate which field is preferred when removing duplicates, so this query will prefer the last name in alphabetical order.

 SELECT MAX(owner_name), 
--floored to the second
dateadd(second,datediff(second,'2000-01-01',start_date),'2000-01-01') AS StartDate
From MyTable
GROUP BY dateadd(second,datediff(second,'2000-01-01',start_date),'2000-01-01')


Related Topics



Leave a reply



Submit