Find Duplicates in SQL

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Finding Duplicates: GROUP BY and DISTINCT giving different answers

Your counting logic is off, and mine was too, until I came up with a simple example to better understand your question. Imagine a simple table with only one column, text:

text
----
A
B
B
C
C
C

Running SELECT COUNT(*) just yields 6 records, as expected. SELECT DISTINCT text returns 3 records, for A,B,C. Finally, SELECT text with HAVING COUNT(*) > 1 returns only two records, for the B and C groups.

None of these numbers add up at all. The issue here is that a distinct select also returns records which are not duplicate, in addition to records which are duplicate. Also, a given duplicate record could occur more than two times. Your current comparison is somewhat apples to oranges.

Edit:

If you want to remove all duplicates in your six-column table, leaving only one distinct record from all columns, then try using a deletable CTE:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ScanNumber, DB_ID, PluginID,
PluginID_Version, Result, ActualValue
ORDER BY (SELECT NULL)) rn
FROM DBAScanResults
)

DELETE
FROM cte
WHERE rn > 1;

SQL query to find duplicates

select * 
from File
where hash in (select
hash
from File
group by hash
having count(*) > 1)

Find duplicate values based on specific criteria

You can get the customer_numbers that you want if you group by customer_number and set the condition in the HAVING clause:

SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2;

To get all the rows of the table that meet your condition, use the operator IN:

SELECT *
FROM tablename
WHERE customer_number IN (
SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2
);

See the demo.

Finding duplicate values in MySQL

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

SQL : Is there any way to find Duplicates and flag them as new column with case

Based on your description, I can phrase your conditions as when the minimum and maximum values of b are different for a, then label as 'duplicate'.

For this, use window functions:

select t.*,
(case when min(b) over (partition by a) <> max(b) over (partition by a)
then 'duplicate'
end) as flag_output
from t;

Based on the data, you seem to want:

select t.*,
(case when count(*) over (partition by a, b) = 1 and
count(*) over (partition by a) > 1
then 'duplicate'
end) as flag_output
from t;

That is, to flag singleton values only when there is more than one value for a.

SQL: find duplicates, with a different field

I finally found a solution.
The correct solution is in this answer:

SELECT DISTINCT HAVING Count unique conditions

Adapted with this version, since I'm using Access 2010:

Count Distinct in a Group By aggregate function in Access 2007 SQL

Therefore, in my example table above, I can use this query to find duplicate records:

SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1

or this query to find all the IDs of the duplicated records:

SELECT ID FROM myTable a INNER JOIN
(
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
) dt
ON a.CountryB=dt.CountryB AND a.Customer=dt.Customer


Related Topics



Leave a reply



Submit