Finding Duplicate Rows in SQL Server

Finding duplicate values in a SQL table


SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Finding duplicate rows in SQL Server


select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

Finding Duplicate Rows in SQL Server Based on Character Matching

You can group by LEFT(FirstName, 3) , for example:

    declare @t table (firstName nvarchar(20), lastname nvarchar(20))

insert into @t
values ('Robert', 'Williams'), ('Robbie', 'Williams'), ('NotRob', 'Williams'), ('Steve', 'Other'), ('Steven', 'Other'), ('Someone', 'Else'), ('Roberto', 'Williams')

select t1.* from @t t1
cross apply (
select
LEFT(firstName, 3) as firstNameShort, lastname
from
@t t2
where LEFT(t2.firstName, 3) = LEFT(t1.firstName, 3)
and t2.lastname = t1.lastname
group by
lastname, LEFT(firstName, 3)
having
COUNT(*) > 1) t3
order by t1.lastname, t1.firstName


Related Topics



Leave a reply



Submit