Removing duplicate rows (based on values from multiple columns) from SQL table
Sample SQL FIDDLE
1) Use CTE to get max ship code value record based on ARDivisionNo, CustomerNo
for each Customers
WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
FROM t
)
Select * from cte WHERE [rn] = 1
2) To Delete the record use Delete query instead of Select and change Where Clause to rn > 1. Sample SQL FIDDLE
WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
FROM t
)
Delete from cte WHERE [rn] > 1;
select * from t;
Deleting duplicates based on multiple columns
Use a cte
and assign row numbers so that all but one for duplicate pairs can be deleted.
with rownums as
(select m.*,
row_number() over(partition by ToUserId, FromUserId order by ToUserId, FromUserId) as rnum
from Message m)
delete r
from rownums r
where rnum > 1
Remove duplicates based on multiple columns and datetime
If the only problem is rownum in final output you can "recount" it with row_number() over (order by datetime asc) as rownum
in final select:
with cte (
visitor_id
,datetime
,channel
,visit_page
) as (
values
(2643331144,'10/3/2021 4:05:29 PM','email','landing page'),
(2643331144,'10/3/2021 4:05:39 PM','organic search','landing page'),
(1092581226,'10/7/2021 1:08:12 PM','email','price reduced'),
(1092581226,'10/7/2021 1:08:44 PM','organic search','landing page'),
(1092581226,'10/7/2021 1:09:04 PM','paid search','unknow'),
(1092581226,'10/7/2021 1:09:05 PM','email','price reduced')
)
select row_number() over (order by datetime asc) as rownum,
visitor_id,
datetime,
channel,
visit_page
from (
-- inlined your WITH clause into subquery
select *,
row_number() over (
partition by visitor_id
order by datetime asc
) as rank
from cte
)
where rank = 1
Output:
rownum | visitor_id | datetime | channel | visit_page |
---|---|---|---|---|
1 | 2643331144 | 10/3/2021 4:05:29 PM | landing page | |
2 | 1092581226 | 10/7/2021 1:08:12 PM | price reduced |
In SQL, how can I delete duplicate rows based on multiple columns?
If you have another unique id
column you can do
delete from PosOrg
where id not in
(
SELECT min(id)
FROM PosOrg
GROUP BY PosId, OrgId
)
Removing duplicate rows (based on values from multiple columns) from SQL Server
In SQL Server, I like to use row_number()
and an updatable CTE for this:
with todelete as (
select t.*, row_number() over (partition by sourceid, dataid order by dateadded desc) as seqnum
from t
)
delete from todelete
where seqnum > 1;
This will work even when sourceid
/dataid
is NULL
(which is probably not the case for you). It also does not assume that dateadded
and id
are mutually increasing (although that might be a reasonable assumption).
SQL Delete duplicate records based on two columns
You can use distinct on
:
select distinct on (car, shop) t.*
from t
order by car, shop, day;
If you want to actually delete the records:
delete from t
where t.day = (select min(t2.day)
from t2
where t2.car = t.car and t2.shop = t.shop
);
Related Topics
Get the First and Last Date of Next Month in MySQL
Partition by with and Without Keep in Oracle
How to Implement SQL Intersect and Minus Operations in Ms Access
How to Get Better Performance Using a Join or Using Exists
Using Ssis to Extract a Xml Representation of Table Data to a File
SQL Server Select into @Variable
To Ignore Duplicate Keys During 'Copy From' in Postgresql
The Ole Db Provider "Microsoft.Ace.Oledb.12.0" for Linked Server "(Null)"
How to Automatically Generate Unique Id in SQL Like Uid12345678
SQL Filter Criteria in Join Criteria or Where Clause Which Is More Efficient
Oracle in VS Exists Difference
How to Use a Dynamic Parameter in a in Clause of a JPA Named Query
Syntax Error at End of Input in Postgresql
How to List All the Columns in a Table
What Is the Most Appropriate Data Type for Storing an Ip Address in SQL Server