Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Find duplicate values based on specific criteria
You can get the customer_number
s that you want if you group by customer_number
and set the condition in the HAVING
clause:
SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2;
To get all the rows of the table that meet your condition, use the operator IN
:
SELECT *
FROM tablename
WHERE customer_number IN (
SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2
);
See the demo.
How to find duplicate entries from a SQL Server table based on some column value?
Please try the following solution.
- 1st CTE is cleansing two columns CustomerName and FatherName.
- 2nd CTE is creating buckets based on the combination of 3 columns:
CustomerName, DOB, and FatherName. - 3rd CTE is creating a counter of rows in a bucket, i.e. finds duplicates.
- Final
SELECT
is joining back to the original table and filtering out
not duplicate rows.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (CustomerCode CHAR(4), CustomerName VARCHAR(20), DOB DATE, FatherName VARCHAR(20));
INSERT INTO @tbl (CustomerCode, CustomerName, DOB, FatherName) VALUES
('0001', 'Md. Alam', '1991-10-20', 'Sr. Alam'),
('0002', 'Alam', '1991-10-20', 'Sr. alam'),
('0004', 'Hasan', '1990-01-01', 'Sr. Hasan'),
('0005', 'Karim', '1988-01-01', 'Sr. Karim'),
('0006', 'Karim', '1988-01-01', 'S Karim'),
('0007', 'Kalam', '1985-01-01', 'Sr. Kalam');
-- DDL and sample data population, end
WITH rs AS
(
SELECT CustomerCode, DOB
, RIGHT(CustomerName, LEN(CustomerName) - c.pos) AS CustomerName
, RIGHT(FatherName, LEN(FatherName) - f.pos) AS FatherName
FROM @tbl
CROSS APPLY (SELECT CHARINDEX(SPACE(1), CustomerName)) AS c(pos)
CROSS APPLY (SELECT CHARINDEX(SPACE(1), FatherName)) AS f(pos)
), cte AS
(
SELECT *
, ROW_NUMBER() OVER (ORDER BY CustomerCode) -
ROW_NUMBER() OVER (PARTITION BY rs.CustomerName, rs.DOB, rs.FatherName ORDER BY rs.CustomerCode) AS bucket
FROM rs
), cte2 AS
(
SELECT CustomerCode, bucket, COUNT(bucket) OVER (PARTITION BY cte.bucket) AS [counter]
FROM cte
GROUP BY CustomerCode, bucket
)
SELECT t.* FROM @tbl AS t
INNER JOIN cte2 ON cte2.CustomerCode = t.CustomerCode
WHERE cte2.counter > 1;
Output
+--------------+--------------+------------+------------+
| CustomerCode | CustomerName | DOB | FatherName |
+--------------+--------------+------------+------------+
| 0001 | Md. Alam | 1991-10-20 | Sr. Alam |
| 0002 | Alam | 1991-10-20 | Sr. alam |
| 0005 | Karim | 1988-01-01 | Sr. Karim |
| 0006 | Karim | 1988-01-01 | S Karim |
+--------------+--------------+------------+------------+
SQL Server : find duplicates in a table based on values in a single column
select
employeename,
IDs = STUFF((SELECT ','+ CAST(e2.[ID] AS VARCHAR(10))
FROM emp e2
WHERE e2.employeename = e1.employeename
For XML PATH('')
),1,1,'')
FROM emp e1
GROUP BY employeename having COUNT(*) > 1
SQL Fiddler
Related Topics
Sql:Remove Last Comma in String
SQL Server Agent Job Account Issue
Is Querying on Views Slower Than Doing One Query
Logging SQL Statements of Entity Framework 5 for Database-First Aproach
Problem with Execute Procedure in Pl/SQL Developer
Computed Column Should Result to String
How to Better Duplicate a Set of Data in SQL Server
Get Sundays for a Given Month Date in a Function SQL
SQL Query - Delete Duplicates If More Than 3 Dups
Dynamic SQL Server Pivot Table
How to Pivot Data in Bigquery Standard SQL Without Manual Hardcoding
What Does a Caret (^) Do in a SQL Query