Fastest Get Duplicates SQL script
This is the more direct way:
select afield1,count(afield1) from atable
group by afield1 having count(afield1) > 1
Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Fast way to check for duplicates in large sql table
You should be using INSERT IGNORE
, and using a UNIQUE constraint on the table based on the columns that should be unique.
When using INSERT IGNORE
, MySQL will automatically detect if the row is unique, and ignore the entry into the database. See this question for more information.
Additionally, searching a multi-million row database should be fast as long as you have the correct indexes on the table. You should not need to search a sub-set of data (Without keys, the database will be forced to do a row-scan, which could cause the delays you're talking about).
- See this post for some additional ideas.
- See also Avoiding Full Table Scans.
Finding duplicate values in MySQL
Do a SELECT
with a GROUP BY
clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
Find duplicate records in MySQL
The key is to rewrite this query so that it can be used as a subquery.
SELECT firstname,
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;
SQL to find duplicate entries (within a group)
You can get the answer with a join instead of a subquery
select
a.*
from
event as a
inner join
(select groupid
from event
group by groupid
having count(*) <> 5) as b
on a.groupid = b.groupid
This is a fairly common way of obtaining the all the information out of the rows in a group.
Like your suggested answer and the other responses, this will run a lot faster with an index on groupid. It's up to the DBA to balance the benefit of making your query run a lot faster against the cost of maintaining yet another index.
If the DBA decides against the index, make sure the appropriate people understand that its the index strategy and not the way you wrote the query that is slowing things down.
How to find duplicate records in PostgreSQL
The basic idea will be using a nested query with count aggregation:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
You can adjust the where clause in the inner query to narrow the search.
There is another good solution for that mentioned in the comments, (but not everyone reads them):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
Or shorter:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
Related Topics
Using 3 Updates in the Same Store Procedure? "Small Error"
How to Do SQL Select Top N ... in As400
Delete Rows with Foreign Key in Postgresql
SQL Query to Add a New Column After an Existing Column in SQL Server 2005
Entityframework, Insert If Not Exist, Otherwise Update
SQL Server 2008 Unique Column That Is Case Sensitive
Select * from Table1 That Does Not Exist in Table2 with Conditional
SQL Run from Excel Cannot Use a Temporary Table
Differencebetween a Candidate Key and a Primary Key
Correct Way to Select from Two Tables in SQL Server with No Common Field to Join On
How to Concatenate Text from Multiple Rows into a Single Text String in Oracle Server
How to Specify "Close Existing Connections" in SQL Script
Postgresql Constraint - Only One Row Can Have Flag Set