Select Statement to Find Duplicates on Certain Fields

Select statement to find duplicates on certain fields

To get the list of fields for which there are multiple records, you can use..

select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1

Check this link for more information on how to delete the rows.

http://support.microsoft.com/kb/139444

There should be a criterion for deciding how you define "first rows" before you use the approach in the link above. Based on that you'll need to use an order by clause and a sub query if needed. If you can post some sample data, it would really help.

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

How do I find duplicates across multiple columns?

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city

SQL: find duplicates, with a different field

I finally found a solution.
The correct solution is in this answer:

SELECT DISTINCT HAVING Count unique conditions

Adapted with this version, since I'm using Access 2010:

Count Distinct in a Group By aggregate function in Access 2007 SQL

Therefore, in my example table above, I can use this query to find duplicate records:

SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1

or this query to find all the IDs of the duplicated records:

SELECT ID FROM myTable a INNER JOIN
(
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
) dt
ON a.CountryB=dt.CountryB AND a.Customer=dt.Customer

Finding duplicate values in MySQL

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

Find duplicate entries in a column

Using:

  SELECT t.ctn_no
FROM YOUR_TABLE t
GROUP BY t.ctn_no
HAVING COUNT(t.ctn_no) > 1

...will show you the ctn_no value(s) that have duplicates in your table. Adding criteria to the WHERE will allow you to further tune what duplicates there are:

  SELECT t.ctn_no
FROM YOUR_TABLE t
WHERE t.s_ind = 'Y'
GROUP BY t.ctn_no
HAVING COUNT(t.ctn_no) > 1

If you want to see the other column values associated with the duplicate, you'll want to use a self join:

SELECT x.*
FROM YOUR_TABLE x
JOIN (SELECT t.ctn_no
FROM YOUR_TABLE t
GROUP BY t.ctn_no
HAVING COUNT(t.ctn_no) > 1) y ON y.ctn_no = x.ctn_no

How to find duplicate records in PostgreSQL

The basic idea will be using a nested query with count aggregation:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

You can adjust the where clause in the inner query to narrow the search.


There is another good solution for that mentioned in the comments, (but not everyone reads them):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

Or shorter:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1

Get duplicate rows only by certain fields from inner join query

You can use this method to count records in place and exclude those with only one record

SELECT * FROM
(

SELECT T1.*,
COUNT(*) OVER (PARTITION BY
T1.CustomerNumber,
T1.Title,
T1.FirstName,
T1.[LastName],
T1.[AgentID],
T1.[AgentName],
T1.[PersonID],
T1.[EMailAddress]
) As DuplicateRowCount

FROM
(
SELECT p.CustomerNumber
,pn.[Title]
,pn.[FirstName]
,pn.[LastName]
,a.[AgentID]
,a.[AgentName]
,a.[PersonID]
,pe.[EMailAddress]
,(SELECT TOP 1 pp.[PhoneCode] FROM [Rez].[PersonPhone] pp
WHERE pp.PersonID = p.PersonID ORDER BY pp.ModifiedUTC DESC) AS Phone
FROM [Rez].[Person] p
INNER JOIN [Rez].[PersonName] pn ON p.PersonID = pn.PersonID
INNER JOIN [Rez].[Agent] a ON a.PersonID = p.PersonID
INNER JOIN [Rez].[PersonEMail] pe ON pe.PersonID = p.PersonID
INNER JOIN [Rez].[AgentRole] ar ON ar.[AgentID] = a.[AgentID]
WHERE a.CreatedUTC > '2018-01-01'
) T1
) T2
WHERE T2.DuplicateRowCount > 1
ORDER BY T2.[EMailAddress], T2.[FirstName], T2.[LastName], T2.[DOB]


Related Topics



Leave a reply



Submit