Removing Duplicate Rows (Based on Values from Multiple Columns) from SQL Table

Removing duplicate rows (based on values from multiple columns) from SQL table

Sample SQL FIDDLE

1) Use CTE to get max ship code value record based on ARDivisionNo, CustomerNo
for each Customers

WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
FROM t
)
Select * from cte WHERE [rn] = 1

2) To Delete the record use Delete query instead of Select and change Where Clause to rn > 1. Sample SQL FIDDLE

WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
FROM t
)
Delete from cte WHERE [rn] > 1;

select * from t;

Deleting duplicates based on multiple columns

Use a cte and assign row numbers so that all but one for duplicate pairs can be deleted.

with rownums as 
(select m.*,
row_number() over(partition by ToUserId, FromUserId order by ToUserId, FromUserId) as rnum
from Message m)
delete r
from rownums r
where rnum > 1

Remove duplicates based on multiple columns and datetime

If the only problem is rownum in final output you can "recount" it with row_number() over (order by datetime asc) as rownum in final select:

with cte (
visitor_id
,datetime
,channel
,visit_page
) as (
values
(2643331144,'10/3/2021 4:05:29 PM','email','landing page'),
(2643331144,'10/3/2021 4:05:39 PM','organic search','landing page'),
(1092581226,'10/7/2021 1:08:12 PM','email','price reduced'),
(1092581226,'10/7/2021 1:08:44 PM','organic search','landing page'),
(1092581226,'10/7/2021 1:09:04 PM','paid search','unknow'),
(1092581226,'10/7/2021 1:09:05 PM','email','price reduced')
)

select row_number() over (order by datetime asc) as rownum,
visitor_id,
datetime,
channel,
visit_page
from (
-- inlined your WITH clause into subquery
select *,
row_number() over (
partition by visitor_id
order by datetime asc
) as rank
from cte
)
where rank = 1

Output:




























rownumvisitor_iddatetimechannelvisit_page
1264333114410/3/2021 4:05:29 PMemaillanding page
2109258122610/7/2021 1:08:12 PMemailprice reduced

In SQL, how can I delete duplicate rows based on multiple columns?

If you have another unique id column you can do

delete from PosOrg
where id not in
(
SELECT min(id)
FROM PosOrg
GROUP BY PosId, OrgId
)

Removing duplicate rows (based on values from multiple columns) from SQL Server

In SQL Server, I like to use row_number() and an updatable CTE for this:

with todelete as (
select t.*, row_number() over (partition by sourceid, dataid order by dateadded desc) as seqnum
from t
)
delete from todelete
where seqnum > 1;

This will work even when sourceid/dataid is NULL (which is probably not the case for you). It also does not assume that dateadded and id are mutually increasing (although that might be a reasonable assumption).

SQL Delete duplicate records based on two columns

You can use distinct on:

select distinct on (car, shop) t.*
from t
order by car, shop, day;

If you want to actually delete the records:

delete from t
where t.day = (select min(t2.day)
from t2
where t2.car = t.car and t2.shop = t.shop
);


Related Topics



Leave a reply



Submit