Find Duplicate Records Based on Two Columns

How do I find duplicates across multiple columns?

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

Find duplicate records based on two columns

Instead of a grouped COUNT you can use it as a windowed aggregate to access the other columns

SELECT fullname,
       address,
       city
FROM   (SELECT *,
               COUNT(*) OVER (PARTITION BY fullname, city) AS cnt
        FROM   employee) e
WHERE  cnt > 1

Query for finding duplicates based on two columns

You can use window function :

select t.*
from (select t.*, 
             count(*) over (partition by cola) as cola_cnt,
             count(*) over (partition by colb) as colb_cnt
      from table t
     ) t
where cola_cnt = 1 and colb_cnt = 1;

Oracle SQL - Select duplicates based on two columns

First group by ADM_ID, IDENTIFIER_VALUE and find groups that has more than one row in it.
Then select all rows that has these couples

SELECT S.NAME
      ,ADMINISTRATIVE_SITE_ID AS ADM_ID
      ,S.EXTERNAL_CODE
      ,SI.IDENTIFIER_VALUE
  FROM SUPPLIERS S INNER JOIN SUPPLIERS_IDENTIFIER SI ON S.ID = SI.SUPPLIER_ID
 WHERE (ADMINISTRATIVE_SITE_ID, SI.IDENTIFIER_VALUE) IN (SELECT ADMINISTRATIVE_SITE_ID AS ADM_ID, SI.IDENTIFIER_VALUE
                                                           FROM SUPPLIERS S INNER JOIN SUPPLIERS_IDENTIFIER SI ON S.ID = SI.SUPPLIER_ID
                                                         GROUP BY ADM_ID, IDENTIFIER_VALUE
                                                         HAVING COUNT(*) > 1)

SQL: How to find duplicates based on two fields?

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1

Find out unique top record based on two columns duplicate value

You can achieved this using RANK() function

with CTE_DATA AS (
                  SELECT
                         CURRENT_POSITION_LATITUDE,
                         CURRENT_POSITION_LONGITUDE,
                         SYS_EVT_TARGET_ID,
                         SYS_MOVEMENT_EVT_ID
                         RANK() OVER(PARTION BY CURRENT_POSITION_LATITUDE, 
                                                 CURRENT_POSITION_LONGITUDE
                                      ORDER BY SYS_MOVEMENT_EVT_ID DESC) LAT_RANK
                  FROM TMS_MOVEMENT_EVT
                 )
SELECT
      CURRENT_POSITION_LATITUDE,
      CURRENT_POSITION_LONGITUDE,
      SYS_EVT_TARGET_ID,
      SYS_MOVEMENT_EVT_ID
FROM CTE_DATA
WHERE LAT_RANK = 1

Some info on using RANK()

Identify duplicate based on multiple columns (may include multiple values) and return Boolean if identified duplicated in python

Let us do

df['New'] = df.assign(produce=df['produce'].str.split(', ')).\
               explode('produce').\
               duplicated(subset=['store', 'station', 'produce'], keep=False).any(level=0)

Out[160]: 
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
11    False
dtype: bool

SQL Delete duplicate records based on two columns

You can use distinct on:

select distinct on (car, shop) t.*
from t
order by car, shop, day;

If you want to actually delete the records:

delete from t
   where t.day = (select min(t2.day)
                  from t2
                  where t2.car = t.car and t2.shop = t.shop
                 );