Find duplicate entries in a column
Using:
SELECT t.ctn_no
FROM YOUR_TABLE t
GROUP BY t.ctn_no
HAVING COUNT(t.ctn_no) > 1
...will show you the ctn_no
value(s) that have duplicates in your table. Adding criteria to the WHERE will allow you to further tune what duplicates there are:
SELECT t.ctn_no
FROM YOUR_TABLE t
WHERE t.s_ind = 'Y'
GROUP BY t.ctn_no
HAVING COUNT(t.ctn_no) > 1
If you want to see the other column values associated with the duplicate, you'll want to use a self join:
SELECT x.*
FROM YOUR_TABLE x
JOIN (SELECT t.ctn_no
FROM YOUR_TABLE t
GROUP BY t.ctn_no
HAVING COUNT(t.ctn_no) > 1) y ON y.ctn_no = x.ctn_no
Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Finding duplicate values in MySQL
Do a SELECT
with a GROUP BY
clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
How to find duplicate records in PostgreSQL
The basic idea will be using a nested query with count aggregation:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
You can adjust the where clause in the inner query to narrow the search.
There is another good solution for that mentioned in the comments, (but not everyone reads them):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
Or shorter:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
Check for duplicate values in Pandas dataframe column
Main question
Is there a duplicate value in a column, True/False?
╔═════════╦═══════════════╗
║ Student ║ Date ║
╠═════════╬═══════════════╣
║ Joe ║ December 2017 ║
╠═════════╬═══════════════╣
║ Bob ║ April 2018 ║
╠═════════╬═══════════════╣
║ Joe ║ December 2018 ║
╚═════════╩═══════════════╝
Assuming above dataframe (df), we could do a quick check if duplicated in the Student
col by:
boolean = not df["Student"].is_unique # True (credit to @Carsten)
boolean = df['Student'].duplicated().any() # True
Further reading and references
Above we are using one of the Pandas Series methods. The pandas DataFrame has several useful methods, two of which are:
- drop_duplicates(self[, subset, keep, inplace]) - Return DataFrame with duplicate rows removed, optionally only considering certain columns.
- duplicated(self[, subset, keep]) - Return boolean Series denoting duplicate rows, optionally only considering certain columns.
These methods can be applied on the DataFrame as a whole, and not just a Serie (column) as above. The equivalent would be:
boolean = df.duplicated(subset=['Student']).any() # True
# We were expecting True, as Joe can be seen twice.
However, if we are interested in the whole frame we could go ahead and do:
boolean = df.duplicated().any() # False
boolean = df.duplicated(subset=['Student','Date']).any() # False
# We were expecting False here - no duplicates row-wise
# ie. Joe Dec 2017, Joe Dec 2018
And a final useful tip. By using the keep
paramater we can normally skip a few rows directly accessing what we need:
keep : {‘first’, ‘last’, False}, default ‘first’
- first : Drop duplicates except for the first occurrence.
- last : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
Example to play around with
import pandas as pd
import io
data = '''\
Student,Date
Joe,December 2017
Bob,April 2018
Joe,December 2018'''
df = pd.read_csv(io.StringIO(data), sep=',')
# Approach 1: Simple True/False
boolean = df.duplicated(subset=['Student']).any()
print(boolean, end='\n\n') # True
# Approach 2: First store boolean array, check then remove
duplicate_in_student = df.duplicated(subset=['Student'])
if duplicate_in_student.any():
print(df.loc[~duplicate_in_student], end='\n\n')
# Approach 3: Use drop_duplicates method
df.drop_duplicates(subset=['Student'], inplace=True)
print(df)
Returns
True
Student Date
0 Joe December 2017
1 Bob April 2018
Student Date
0 Joe December 2017
1 Bob April 2018
Find duplicate records in MySQL
The key is to rewrite this query so that it can be used as a subquery.
SELECT firstname,
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;
Find duplicate records in a column and update the duplicates to max + 1 (oracle)
I found the solution.
DECLARE
v_cntr NUMBER := 0;
BEGIN
FOR rec IN (
SELECT X.rownumber, X.rowid, X.col1
FROM (Select ROWID,col1, ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col1) as rownumber
FROM tab1) X
WHERE X.rownumber > 1
)
loop
v_cntr := v_cntr + 1;
UPDATE tab1
SET col1 = (Select max(col1)+1 from tab1)
WHERE rowid = rec.rowid;
DBMS_OUTPUT.PUT_LINE ('done');
END loop;
--COMMIT;
END;
Related Topics
How to Find the Size of an Array in Postgresql
Sql- Ignore Case While Searching for a String
MySQL Myisam Table Performance Problem Revisited
Pass String Variable Without Quotes in Query Vba
Calculating Age from Birthday - Tsql, Oracle, and Any Others
Update Table with a Subquery Which Is Returning More Than One Row
Select Something That Has More/Less Than X Character
Ukrainian Character Change to Question Mark When Insert to Table
Postgresql: Check If Schema Exists
Update Xml Tag in a Clob Column in Oracle
Add Missing Data from Previous Month or Year Cumulatively
Is It a Bad Idea to Use Guids as Primary Keys in Ms SQL
Querying Where Condition to Character Length