Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Finding Duplicates: GROUP BY and DISTINCT giving different answers
Your counting logic is off, and mine was too, until I came up with a simple example to better understand your question. Imagine a simple table with only one column, text
:
text
----
A
B
B
C
C
C
Running SELECT COUNT(*)
just yields 6 records, as expected. SELECT DISTINCT text
returns 3 records, for A,B,C
. Finally, SELECT text
with HAVING COUNT(*) > 1
returns only two records, for the B
and C
groups.
None of these numbers add up at all. The issue here is that a distinct select also returns records which are not duplicate, in addition to records which are duplicate. Also, a given duplicate record could occur more than two times. Your current comparison is somewhat apples to oranges.
Edit:
If you want to remove all duplicates in your six-column table, leaving only one distinct record from all columns, then try using a deletable CTE:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ScanNumber, DB_ID, PluginID,
PluginID_Version, Result, ActualValue
ORDER BY (SELECT NULL)) rn
FROM DBAScanResults
)
DELETE
FROM cte
WHERE rn > 1;
SQL query to find duplicates
select *
from File
where hash in (select
hash
from File
group by hash
having count(*) > 1)
Find duplicate values based on specific criteria
You can get the customer_number
s that you want if you group by customer_number
and set the condition in the HAVING
clause:
SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2;
To get all the rows of the table that meet your condition, use the operator IN
:
SELECT *
FROM tablename
WHERE customer_number IN (
SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2
);
See the demo.
Finding duplicate values in MySQL
Do a SELECT
with a GROUP BY
clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
SQL : Is there any way to find Duplicates and flag them as new column with case
Based on your description, I can phrase your conditions as when the minimum and maximum values of b
are different for a
, then label as 'duplicate'
.
For this, use window functions:
select t.*,
(case when min(b) over (partition by a) <> max(b) over (partition by a)
then 'duplicate'
end) as flag_output
from t;
Based on the data, you seem to want:
select t.*,
(case when count(*) over (partition by a, b) = 1 and
count(*) over (partition by a) > 1
then 'duplicate'
end) as flag_output
from t;
That is, to flag singleton values only when there is more than one value for a
.
SQL: find duplicates, with a different field
I finally found a solution.
The correct solution is in this answer:
SELECT DISTINCT HAVING Count unique conditions
Adapted with this version, since I'm using Access 2010:
Count Distinct in a Group By aggregate function in Access 2007 SQL
Therefore, in my example table above, I can use this query to find duplicate records:
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
or this query to find all the IDs of the duplicated records:
SELECT ID FROM myTable a INNER JOIN
(
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
) dt
ON a.CountryB=dt.CountryB AND a.Customer=dt.Customer
Related Topics
Renaming a Column in Ms SQL Server 2005
Make H2 Treat Quoted Name and Unquoted Name as the Same
T-SQL Column Alias on Computed Column - Invalid Column Name
Differencebetween Prepared Statements and SQL or Pl/Pgsql Functions, in Terms of Their Purpose
How to Open Bcp Host Data-File
How to Deep Copy a Set of Data, and Change Fk References to Point to All the Copies
Performance Value of Comb Guids
Using a Select Statement Within a Where Clause
How to Copy Structure and Contents of a Table, But with Separate Sequence
Insert Empty String into Int Column for SQL Server
Table Valued Function Where Did My Query Plan Go
Recursive Query for Bill of Materials
How to Merge Multiple Database Files in SQLite
Rounding to 2 Decimal Places in SQL