Finding All the Users That Have Duplicate Names

Finding all the users that have duplicate names

You could go a long way toward narrowing down your search by finding out what the duplicated data is in the first place. For example, say you want to find each combination of first name and email that is used more than once.

User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )

That will return an array containing one of each of the duplicated records. From that, say one of the returned users had "Fred" and "fred@example.com" then you could search for only Users having those values to find all of the affected users.

The return from that find will be something like the following. Note that the array only contains a single record from each set of duplicated users.

[#<User id: 3, first: "foo", last: "barney", email: "foo@example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">, 
#<User id: 5, first: "foo1", last: "baasdasdr", email: "abc@example.com", created_at: "2010-12-30 17:20:49", updated_at: "2010-12-30 17:20:49">]

For example, the first element in that array shows one user with "foo" and "foo@example.com". The rest of them can be pulled out of the database as needed with a find.

> User.find(:all, :conditions => {:email => "foo@example.com", :first => "foo"})
=> [#<User id: 1, first: "foo", last: "bar", email: "foo@example.com", created_at: "2010-12-30 17:14:28", updated_at: "2010-12-30 17:14:28">,
#<User id: 3, first: "foo", last: "barney", email: "foo@example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">]

And it also seems like you'll want to add some better validation to your code to prevent duplicates in the future.

Edit:

If you need to use the big hammer of find_by_sql, because Rails 2.2 and earlier didn't support :having with find, the following should work and give you the same array that I described above.

User.find_by_sql("select * from users group by first,email having count(*) > 1")

Finding and dealing with duplicate users

Try to add the is_not_duplicate boolean field and modify your code as follows:

SELECT 
GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "duplicate_count",
SUM(is_not_duplicate) AS "real_count"
FROM
users
GROUP BY
name
HAVING
duplicate_count > 1
AND
duplicate_count - real_count > 0

Newly added duplicates will have is_not_duplicate=0 so the real_count for that name will be less than duplicate_count and the row will be shown

List all duplicate name with different id in MySQL

I found the solution as follows

First i created a view in MySQL


CREATE
VIEW duplicates
AS
(SELECT u.id,u.name, u.did
FROM test u
INNER JOIN (
SELECT NAME,did, COUNT(*)
FROM test
GROUP BY NAME
HAVING COUNT(*) > 1) temp
ON (temp.name = u.name)
ORDER BY name);

Then run this query


SELECT p.id, p.name,p.did
FROM test AS p
INNER JOIN duplicates AS d ON (d.name=p.name AND d.did!=p.did);

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Finding duplicate values in MySQL

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

How to find duplicate names in a table

To find the duplicates you can do:

SELECT P_name,
P_Address,
P_city
FROM Data_Excel
GROUP BY P_Name,
P_Address,
P_city
HAVING COUNT(*) > 1;

To remove duplicates you could do:

DELETE
FROM Data_Excel
WHERE rowid NOT IN (
SELECT MIN(rowid)
FROM Data_Excel
GROUP BY P_Name,
P_Address,
P_city
);

To Insert in Person table you would do:

INSERT INTO Person(id,name)
SELECT (SELECT MAX(id)+1 FROM Person),P_Name
FROM Data_Excel WHERE P_Name NOT IN (SELECT name FROM Person)

Check if a string exists in an array case insensitively

you can use

word.lowercaseString 

to convert the string to all lowercase characters

Finding duplicate names where first name can be an initial or full name

I don't quite get what you want. Yor provided a query, your current table and the expected result.

I've just created your table, run your query and got the expected result. What is wrong with this?

SELECT  FROM table1 AS a
INNER JOIN (
SELECT surname FROM table1
GROUP BY surname
HAVING COUNT(*) > 1
) AS b ON a.surname = b.surname

This effectively result in your expected result:

joe | bloggs
j | bloggs

Or am I missing something?

After re-reading... are you expecting to get only this?

j | bloggs

If that is the case, use this:

SELECT * FROM table1 AS a
INNER JOIN (
SELECT surname FROM table1
GROUP BY surname
HAVING COUNT(*) > 1
) AS b ON a.surname = b.surname
WHERE CHAR_LENGTH(firstname) = 1

Edit:

After the expected result was properly explained I conclude the query should be:

SELECT a.firstname, a.surname FROM t1 AS a
INNER JOIN (
SELECT LEFT(firstname, 1) AS firstChar, surname FROM t1
GROUP BY surname, firstChar
HAVING COUNT(surname) > 1
) AS b ON a.surname = b.surname AND b.firstChar = LEFT(a.firstname, 1)

Working example

Get all users with duplicate email addresses

Try the query with a JOIN instead of IN:

SELECT  user.personal_email, user.userid
FROM user
INNER JOIN
( SELECT personal_email
FROM User
GROUP BY personal_email
HAVING COUNT(*) > 1
) dupe
ON dupe.personal_email = user.personal_email;

MySQL often optimises INNER JOINs much better.



Related Topics



Leave a reply



Submit