How to Find Duplicate Entries in a Database Table

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

How do I find duplicate entries in a database table?

A nested query can do the job.

SELECT author_last_name, dewey_number, NumOccurrences
FROM author INNER JOIN
( SELECT author_id, dewey_number, COUNT(dewey_number) AS NumOccurrences
FROM book
GROUP BY author_id, dewey_number
HAVING ( COUNT(dewey_number) > 1 ) ) AS duplicates
ON author.id = duplicates.author_id

(I don't know if this is the fastest way to achieve what you want.)

Update: Here is my data

SELECT * FROM author;
id | author_last_name
----+------------------
1 | Fowler
2 | Knuth
3 | Lang

SELECT * FROM book;
id | author_id | dewey_number | title
----+-----------+--------------+------------------------
1 | 1 | 600 | Refactoring
2 | 1 | 600 | Refactoring
3 | 1 | 600 | Analysis Patterns
4 | 2 | 600 | TAOCP vol. 1
5 | 2 | 600 | TAOCP vol. 1
6 | 2 | 600 | TAOCP vol. 2
7 | 3 | 500 | Algebra
8 | 3 | 500 | Undergraduate Analysis
9 | 1 | 600 | Refactoring
10 | 2 | 500 | Concrete Mathematics
11 | 2 | 500 | Concrete Mathematics
12 | 2 | 500 | Concrete Mathematics

And here is the result of the above query:

 author_last_name | dewey_number | numoccurrences 
------------------+--------------+----------------
Fowler | 600 | 4
Knuth | 600 | 3
Knuth | 500 | 3
Lang | 500 | 2

How to find duplicate records in PostgreSQL

The basic idea will be using a nested query with count aggregation:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

You can adjust the where clause in the inner query to narrow the search.


There is another good solution for that mentioned in the comments, (but not everyone reads them):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

Or shorter:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1

Finding duplicate values in MySQL

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

Find duplicate records in MySQL

The key is to rewrite this query so that it can be used as a subquery.

SELECT firstname, 
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;

How do I find duplicate values in a table in Oracle?

Aggregate the column by COUNT, then use a HAVING clause to find values that appear more than once.

SELECT column_name, COUNT(column_name)
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) > 1;


Related Topics



Leave a reply



Submit