Finding Duplicate Values in a SQL Table

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Select Query to find Duplicated Values from the table

you can use window function if you want to find duplicates based on customer, documentvalue and date and keep the docnum.

SELECT Docnum , CustomerName, DocumentValue, Date,  c from
(
SELECT Docnum , CustomerName, DocumentValue, Date, COUNT(1) OVER(PARTITION BY CustomerName, DocumentValue, Date) AS c

) t where c >= 1;

Find duplicate values based on specific criteria

You can get the customer_numbers that you want if you group by customer_number and set the condition in the HAVING clause:

SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2;

To get all the rows of the table that meet your condition, use the operator IN:

SELECT *
FROM tablename
WHERE customer_number IN (
SELECT customer_number
FROM tablename
GROUP BY customer_number
HAVING SUM(end_record_date = '00:00:00') >= 2
);

See the demo.

Finding duplicate values in multiple colums in a SQL table and count for chars

First normalize the table with UNION ALL in a CTE to get each of the 3 names in a separate row.

Then with ROW_NUMBER() window function you can rank alphabetically the 3 names so that you can group by them:

WITH cte(id, name) AS (
SELECT id, name1 FROM tablename
UNION ALL
SELECT id, name2 FROM tablename
UNION ALL
SELECT id, name3 FROM tablename
)
SELECT COUNT(*) count, name1, name2, name3
FROM (
SELECT id,
MAX(CASE WHEN rn = 1 THEN name END) name1,
MAX(CASE WHEN rn = 2 THEN name END) name2,
MAX(CASE WHEN rn = 3 THEN name END) name3
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) rn
FROM cte
)
GROUP BY id
)
GROUP BY name1, name2, name3
HAVING COUNT(*) > 1;

Another way to do it, that uses similar logic to your previous question with numeric values, with string function REPLACE() instead of window functions, but works only if the 3 names in each row are different:

SELECT COUNT(*) count,
MIN(name1, name2, name3) name_1,
REPLACE(
REPLACE(
REPLACE(name1 || ',' || name2 || ',' || name3, MIN(name1, name2, name3), ''),
MAX(name1, name2, name3), ''), ',', ''
) name_2,
MAX(name1, name2, name3) name_3
FROM tablename
GROUP BY name_1, name_2, name_3
HAVING COUNT(*) > 1;

See the demo.

Find duplicate Values in a SQL Table and add unique value in a column

It looks like a window count can do what you want:

select t.*,
case when count(*) over(partition by location) > 1 then id end duplicate
from mytable t

This requires MySQL 8.0. In earlier versions, an alternative is a correlated subquery:

select t.*,
case when (select count(*) from mytable t1 where t1.location = t.location) > 1 then id end duplicate
from mytable t


Related Topics



Leave a reply



Submit