Finding all the users that have duplicate names
You could go a long way toward narrowing down your search by finding out what the duplicated data is in the first place. For example, say you want to find each combination of first name and email that is used more than once.
User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )
That will return an array containing one of each of the duplicated records. From that, say one of the returned users had "Fred" and "fred@example.com" then you could search for only Users having those values to find all of the affected users.
The return from that find
will be something like the following. Note that the array only contains a single record from each set of duplicated users.
[#<User id: 3, first: "foo", last: "barney", email: "foo@example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">,
#<User id: 5, first: "foo1", last: "baasdasdr", email: "abc@example.com", created_at: "2010-12-30 17:20:49", updated_at: "2010-12-30 17:20:49">]
For example, the first element in that array shows one user with "foo" and "foo@example.com". The rest of them can be pulled out of the database as needed with a find.
> User.find(:all, :conditions => {:email => "foo@example.com", :first => "foo"})
=> [#<User id: 1, first: "foo", last: "bar", email: "foo@example.com", created_at: "2010-12-30 17:14:28", updated_at: "2010-12-30 17:14:28">,
#<User id: 3, first: "foo", last: "barney", email: "foo@example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">]
And it also seems like you'll want to add some better validation to your code to prevent duplicates in the future.
Edit:
If you need to use the big hammer of find_by_sql
, because Rails 2.2 and earlier didn't support :having
with find
, the following should work and give you the same array that I described above.
User.find_by_sql("select * from users group by first,email having count(*) > 1")
Finding and dealing with duplicate users
Try to add the is_not_duplicate
boolean field and modify your code as follows:
SELECT
GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "duplicate_count",
SUM(is_not_duplicate) AS "real_count"
FROM
users
GROUP BY
name
HAVING
duplicate_count > 1
AND
duplicate_count - real_count > 0
Newly added duplicates will have is_not_duplicate=0
so the real_count
for that name will be less than duplicate_count
and the row will be shown
List all duplicate name with different id in MySQL
I found the solution as follows
First i created a view in MySQL
CREATE
VIEW duplicates
AS
(SELECT u.id,u.name, u.did
FROM test u
INNER JOIN (
SELECT NAME,did, COUNT(*)
FROM test
GROUP BY NAME
HAVING COUNT(*) > 1) temp
ON (temp.name = u.name)
ORDER BY name);
Then run this query
SELECT p.id, p.name,p.did
FROM test AS p
INNER JOIN duplicates AS d ON (d.name=p.name AND d.did!=p.did);
Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Finding duplicate values in MySQL
Do a SELECT
with a GROUP BY
clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
How to find duplicate names in a table
To find the duplicates you can do:
SELECT P_name,
P_Address,
P_city
FROM Data_Excel
GROUP BY P_Name,
P_Address,
P_city
HAVING COUNT(*) > 1;
To remove duplicates you could do:
DELETE
FROM Data_Excel
WHERE rowid NOT IN (
SELECT MIN(rowid)
FROM Data_Excel
GROUP BY P_Name,
P_Address,
P_city
);
To Insert in Person table you would do:
INSERT INTO Person(id,name)
SELECT (SELECT MAX(id)+1 FROM Person),P_Name
FROM Data_Excel WHERE P_Name NOT IN (SELECT name FROM Person)
Check if a string exists in an array case insensitively
you can use
word.lowercaseString
to convert the string to all lowercase characters
Finding duplicate names where first name can be an initial or full name
I don't quite get what you want. Yor provided a query, your current table and the expected result.
I've just created your table, run your query and got the expected result. What is wrong with this?
SELECT FROM table1 AS a
INNER JOIN (
SELECT surname FROM table1
GROUP BY surname
HAVING COUNT(*) > 1
) AS b ON a.surname = b.surname
This effectively result in your expected result:
joe | bloggs
j | bloggs
Or am I missing something?
After re-reading... are you expecting to get only this?
j | bloggs
If that is the case, use this:
SELECT * FROM table1 AS a
INNER JOIN (
SELECT surname FROM table1
GROUP BY surname
HAVING COUNT(*) > 1
) AS b ON a.surname = b.surname
WHERE CHAR_LENGTH(firstname) = 1
Edit:
After the expected result was properly explained I conclude the query should be:
SELECT a.firstname, a.surname FROM t1 AS a
INNER JOIN (
SELECT LEFT(firstname, 1) AS firstChar, surname FROM t1
GROUP BY surname, firstChar
HAVING COUNT(surname) > 1
) AS b ON a.surname = b.surname AND b.firstChar = LEFT(a.firstname, 1)
Working example
Get all users with duplicate email addresses
Try the query with a JOIN instead of IN
:
SELECT user.personal_email, user.userid
FROM user
INNER JOIN
( SELECT personal_email
FROM User
GROUP BY personal_email
HAVING COUNT(*) > 1
) dupe
ON dupe.personal_email = user.personal_email;
MySQL often optimises INNER JOINs much better.
Related Topics
Redirect the "Puts" Command Output to a Log File
Ruby: Get All Keys in a Hash (Including Sub Keys)
How to Install Ruby Gems When Using Rvm
Ruby Classes: Initialize Self VS. @Variable
Simple_Form: Remove Outer Label for an Inline Checkbox with Label
Ruby on Rails and JSON Parser from Url
What's the Difference Between a Class and the Singleton of That Class in Ruby
Ruby/Rails Using || to Determine Value, Using an Empty String Instead of a Nil Value
How to Download a CSV File in Ruby on Rails
App Pushed to Heroku Still Shows Standard Index Page
All Possible Permutations of a Given String
Convert JSON String to JSON Array in Rails
MAC Os X Mountain Lion "Rails Is Not Currently Installed on This System."
Ruby: Initialize() VS Class Body
Why Are Constants from Extended Module Not Available in Class Methods Declared with Self