SQL Server:Find Duplicates in a Table Based on Values in a Single Column

Finding duplicate values in a SQL table

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

Recent PostgreSQL supports it.
SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
MySQL is unpredictable and you need sql_mode=only_full_group_by:
- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Find duplicate values based on specific criteria

You can get the customer_numbers that you want if you group by customer_number and set the condition in the HAVING clause:

SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2;

To get all the rows of the table that meet your condition, use the operator IN:

SELECT *
FROM tablename
WHERE customer_number IN (
  SELECT customer_number
  FROM tablename
  GROUP BY customer_number
  HAVING SUM(end_record_date = '00:00:00') >= 2
);

See the demo.

How to find duplicate entries from a SQL Server table based on some column value?

Please try the following solution.

1st CTE is cleansing two columns CustomerName and FatherName.
2nd CTE is creating buckets based on the combination of 3 columns:
CustomerName, DOB, and FatherName.
3rd CTE is creating a counter of rows in a bucket, i.e. finds duplicates.
Final SELECT is joining back to the original table and filtering out
not duplicate rows.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (CustomerCode CHAR(4), CustomerName VARCHAR(20), DOB DATE, FatherName VARCHAR(20));
INSERT INTO @tbl (CustomerCode, CustomerName, DOB, FatherName) VALUES
('0001', 'Md. Alam', '1991-10-20', 'Sr. Alam'),
('0002', 'Alam', '1991-10-20', 'Sr. alam'),
('0004', 'Hasan', '1990-01-01', 'Sr. Hasan'),
('0005', 'Karim', '1988-01-01', 'Sr. Karim'),
('0006', 'Karim', '1988-01-01', 'S Karim'),
('0007', 'Kalam', '1985-01-01', 'Sr. Kalam');
-- DDL and sample data population, end

WITH rs AS
(
    SELECT CustomerCode, DOB
        , RIGHT(CustomerName, LEN(CustomerName) - c.pos) AS CustomerName
        , RIGHT(FatherName, LEN(FatherName) - f.pos) AS FatherName
    FROM @tbl
        CROSS APPLY (SELECT CHARINDEX(SPACE(1), CustomerName)) AS c(pos)
        CROSS APPLY (SELECT CHARINDEX(SPACE(1), FatherName)) AS f(pos)
), cte AS
(
    SELECT * 
    , ROW_NUMBER() OVER (ORDER BY CustomerCode) -
     ROW_NUMBER() OVER (PARTITION BY rs.CustomerName, rs.DOB, rs.FatherName ORDER BY rs.CustomerCode) AS bucket
    FROM rs
), cte2 AS
(
    SELECT CustomerCode, bucket, COUNT(bucket) OVER (PARTITION BY cte.bucket) AS [counter]
    FROM cte
    GROUP BY  CustomerCode, bucket
)
SELECT t.* FROM @tbl AS t
INNER JOIN cte2 ON cte2.CustomerCode = t.CustomerCode
WHERE cte2.counter > 1;

Output

+--------------+--------------+------------+------------+
| CustomerCode | CustomerName |    DOB     | FatherName |
+--------------+--------------+------------+------------+
|         0001 | Md. Alam     | 1991-10-20 | Sr. Alam   |
|         0002 | Alam         | 1991-10-20 | Sr. alam   |
|         0005 | Karim        | 1988-01-01 | Sr. Karim  |
|         0006 | Karim        | 1988-01-01 | S Karim    |
+--------------+--------------+------------+------------+

SQL Server : find duplicates in a table based on values in a single column

select
 employeename,
 IDs = STUFF((SELECT ','+ CAST(e2.[ID] AS VARCHAR(10)) 
  FROM emp e2
  WHERE e2.employeename = e1.employeename
  For XML PATH('')
 ),1,1,'')
FROM emp e1
GROUP BY employeename having COUNT(*) > 1

SQL Fiddler

SQL Server:Find Duplicates in a Table Based on Values in a Single Column

Finding duplicate values in a SQL table

Find duplicate values based on specific criteria

How to find duplicate entries from a SQL Server table based on some column value?

SQL Server : find duplicates in a table based on values in a single column

Related Topics

Leave a reply