Fastest "Get Duplicates" SQL Script

Fastest Get Duplicates SQL script

This is the more direct way:

select afield1,count(afield1) from atable 
group by afield1 having count(afield1) > 1

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Fast way to check for duplicates in large sql table

You should be using INSERT IGNORE, and using a UNIQUE constraint on the table based on the columns that should be unique.

When using INSERT IGNORE, MySQL will automatically detect if the row is unique, and ignore the entry into the database. See this question for more information.

Additionally, searching a multi-million row database should be fast as long as you have the correct indexes on the table. You should not need to search a sub-set of data (Without keys, the database will be forced to do a row-scan, which could cause the delays you're talking about).

  • See this post for some additional ideas.
  • See also Avoiding Full Table Scans.

Finding duplicate values in MySQL

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

Find duplicate records in MySQL

The key is to rewrite this query so that it can be used as a subquery.

SELECT firstname, 
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;

SQL to find duplicate entries (within a group)

You can get the answer with a join instead of a subquery

select
a.*
from
event as a
inner join
(select groupid
from event
group by groupid
having count(*) <> 5) as b
on a.groupid = b.groupid

This is a fairly common way of obtaining the all the information out of the rows in a group.

Like your suggested answer and the other responses, this will run a lot faster with an index on groupid. It's up to the DBA to balance the benefit of making your query run a lot faster against the cost of maintaining yet another index.

If the DBA decides against the index, make sure the appropriate people understand that its the index strategy and not the way you wrote the query that is slowing things down.

How to find duplicate records in PostgreSQL

The basic idea will be using a nested query with count aggregation:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

You can adjust the where clause in the inner query to narrow the search.


There is another good solution for that mentioned in the comments, (but not everyone reads them):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

Or shorter:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1


Related Topics



Leave a reply



Submit