Removing Duplicate Rows (Based on Values from Multiple Columns) from SQL Table

Removing duplicate rows (based on values from multiple columns) from SQL table

Sample SQL FIDDLE

1) Use CTE to get max ship code value record based on ARDivisionNo, CustomerNo
for each Customers

WITH cte AS (
  SELECT*, 
     row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
  FROM t
)
Select * from cte WHERE [rn] = 1

2) To Delete the record use Delete query instead of Select and change Where Clause to rn > 1. Sample SQL FIDDLE

WITH cte AS (
  SELECT*, 
     row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
  FROM t
)
Delete from cte WHERE [rn] > 1;

select * from t;

Deleting duplicates based on multiple columns

Use a cte and assign row numbers so that all but one for duplicate pairs can be deleted.

with rownums as 
(select m.*, 
 row_number() over(partition by ToUserId, FromUserId order by ToUserId, FromUserId) as rnum
 from Message m)
delete r
from rownums r
where rnum > 1

Remove duplicates based on multiple columns and datetime

If the only problem is rownum in final output you can "recount" it with row_number() over (order by datetime asc) as rownum in final select:

with cte (
   visitor_id 
  ,datetime   
  ,channel    
  ,visit_page 
) as (
    values 
     (2643331144,'10/3/2021 4:05:29 PM','email','landing page'),
 (2643331144,'10/3/2021 4:05:39 PM','organic search','landing page'),
 (1092581226,'10/7/2021 1:08:12 PM','email','price reduced'),
 (1092581226,'10/7/2021 1:08:44 PM','organic search','landing page'),
 (1092581226,'10/7/2021 1:09:04 PM','paid search','unknow'),
 (1092581226,'10/7/2021 1:09:05 PM','email','price reduced')
)

select row_number() over (order by datetime asc) as rownum,
    visitor_id,
    datetime,
    channel,
    visit_page
from (
        -- inlined your WITH clause into subquery
        select *,
            row_number() over (
                partition by visitor_id
                order by datetime asc
            ) as rank
        from cte
    )
where rank = 1

Output:

rownum	visitor_id	datetime	channel	visit_page
1	2643331144	10/3/2021 4:05:29 PM	email	landing page
2	1092581226	10/7/2021 1:08:12 PM	email	price reduced

In SQL, how can I delete duplicate rows based on multiple columns?

If you have another unique id column you can do

delete from PosOrg
where id not in
(
  SELECT      min(id)
  FROM        PosOrg
  GROUP BY    PosId, OrgId
)

Removing duplicate rows (based on values from multiple columns) from SQL Server

In SQL Server, I like to use row_number() and an updatable CTE for this:

with todelete as (
      select t.*, row_number() over (partition by sourceid, dataid order by dateadded desc) as seqnum
      from t
     )
delete from todelete
    where seqnum > 1;

This will work even when sourceid/dataid is NULL (which is probably not the case for you). It also does not assume that dateadded and id are mutually increasing (although that might be a reasonable assumption).

SQL Delete duplicate records based on two columns

You can use distinct on:

select distinct on (car, shop) t.*
from t
order by car, shop, day;

If you want to actually delete the records:

delete from t
   where t.day = (select min(t2.day)
                  from t2
                  where t2.car = t.car and t2.shop = t.shop
                 );

Removing Duplicate Rows (Based on Values from Multiple Columns) from SQL Table