Find Duplicate Records Based on Two Columns

How do I find duplicates across multiple columns?

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city

Find duplicate records based on two columns

Instead of a grouped COUNT you can use it as a windowed aggregate to access the other columns

SELECT fullname,
address,
city
FROM (SELECT *,
COUNT(*) OVER (PARTITION BY fullname, city) AS cnt
FROM employee) e
WHERE cnt > 1

Query for finding duplicates based on two columns

You can use window function :

select t.*
from (select t.*,
count(*) over (partition by cola) as cola_cnt,
count(*) over (partition by colb) as colb_cnt
from table t
) t
where cola_cnt = 1 and colb_cnt = 1;

Oracle SQL - Select duplicates based on two columns

First group by ADM_ID, IDENTIFIER_VALUE and find groups that has more than one row in it.
Then select all rows that has these couples

SELECT S.NAME
,ADMINISTRATIVE_SITE_ID AS ADM_ID
,S.EXTERNAL_CODE
,SI.IDENTIFIER_VALUE
FROM SUPPLIERS S INNER JOIN SUPPLIERS_IDENTIFIER SI ON S.ID = SI.SUPPLIER_ID
WHERE (ADMINISTRATIVE_SITE_ID, SI.IDENTIFIER_VALUE) IN (SELECT ADMINISTRATIVE_SITE_ID AS ADM_ID, SI.IDENTIFIER_VALUE
FROM SUPPLIERS S INNER JOIN SUPPLIERS_IDENTIFIER SI ON S.ID = SI.SUPPLIER_ID
GROUP BY ADM_ID, IDENTIFIER_VALUE
HAVING COUNT(*) > 1)

SQL: How to find duplicates based on two fields?

SELECT  *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1

Find out unique top record based on two columns duplicate value

You can achieved this using RANK() function

with CTE_DATA AS (
SELECT
CURRENT_POSITION_LATITUDE,
CURRENT_POSITION_LONGITUDE,
SYS_EVT_TARGET_ID,
SYS_MOVEMENT_EVT_ID
RANK() OVER(PARTION BY CURRENT_POSITION_LATITUDE,
CURRENT_POSITION_LONGITUDE
ORDER BY SYS_MOVEMENT_EVT_ID DESC) LAT_RANK
FROM TMS_MOVEMENT_EVT
)
SELECT
CURRENT_POSITION_LATITUDE,
CURRENT_POSITION_LONGITUDE,
SYS_EVT_TARGET_ID,
SYS_MOVEMENT_EVT_ID
FROM CTE_DATA
WHERE LAT_RANK = 1

Some info on using RANK()

Identify duplicate based on multiple columns (may include multiple values) and return Boolean if identified duplicated in python

Let us do

df['New'] = df.assign(produce=df['produce'].str.split(', ')).\
explode('produce').\
duplicated(subset=['store', 'station', 'produce'], keep=False).any(level=0)

Out[160]:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 False
9 True
10 True
11 False
dtype: bool

SQL Delete duplicate records based on two columns

You can use distinct on:

select distinct on (car, shop) t.*
from t
order by car, shop, day;

If you want to actually delete the records:

delete from t
where t.day = (select min(t2.day)
from t2
where t2.car = t.car and t2.shop = t.shop
);


Related Topics



Leave a reply



Submit