Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Finding duplicate rows while Inserting data into Oracle table
I would add a join to the priority table and add an analytic function to select the preferred row in case of duplicates. The query would look like this:
Select device_id, cg_id, firmware_id, best_Status, dmc_id
From (Select a.device_id, a.cg_id, a.firmware_id, a.best_Status, a.dmc_id,
rank() Over (Partition By a.device_id, a.firmware_id
Order By b.priority) As rnk
From dm_djr_bulkjob a
Join priority_table b on b.best_status = a.best_status
Where a.oltp_updated Between '27-JUL-17' And '28-JUL-17')
Where rnk = 1;
The rank() function assigns a rank number to each row such that one row per unique (device_id, firmware_id) combination would have rnk = 1: the row with the lowest priority.
How can I check for duplicates before inserting into a table when inserting by select
INSERT INTO table1
SELECT t2.col1,
t2.col2
FROM table2 t2
LEFT JOIN table1 t1
ON t2.col1 = t1.col1
AND t2.col2 = t1.col2
WHERE t1.col1 IS NULL
Alternative using except
INSERT INTO @table2
SELECT col1,
col2
FROM table1
EXCEPT
SELECT t1.col1,
t1.col2
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
Alternative using Not Exists
INSERT INTO table2
SELECT col1,col2
FROM table1 t1
WHERE
NOT EXISTS( SELECT 1
FROM table2 t2
WHERE t1.col1 = t2.col1
AND t1.col2 = t2.col2)
What happens with duplicates when inserting multiple rows?
The INSERT
will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY
, UNIQUE
, CHECK
or EXCLUDE
constraint) - which you did not mention in your question. But that's what you are probably worried about.
Assuming a UNIQUE
or PK constraint on (col1,col2)
, you are dealing with a textbook UPSERT
situation. Many related questions and answers to find here.
Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.
Without concurrent writes
I.e.: No other transactions will try to write to the same table at the same time.
Exclude rows that are already in the table with
WHERE NOT EXISTS ...
or any other applicable technique:Select rows which are not present in other table
And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join
WHERE NOT EXISTS ...
One technique to deal with both at once would be EXCEPT
:
INSERT INTO tbl (col1, col2)
VALUES
(text 'v1', text 'v2') -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4') -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;
EXCEPT
without the key word ALL
folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use EXCEPT ALL
(or one of the other techniques). See:
- Using EXCEPT clause in PostgreSQL
Generally, if the target table is big, WHERE NOT EXISTS
in combination with DISTINCT
on the source will probably be faster:
INSERT INTO tbl (col1, col2)
SELECT *
FROM (
SELECT DISTINCT *
FROM (
VALUES
(text 'v1', text'v2')
, ('v3', 'v4')
, ('v3', 'v4') -- dupes in source
) t(c1, c2)
) t
WHERE NOT EXISTS (
SELECT FROM tbl
WHERE col1 = t.c1 AND col2 = t.c2
);
If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.
Related:
- Select rows which are not present in other table
With concurrent writes
Use the Postgres UPSERT
implementation INSERT ... ON CONFLICT ...
in Postgres 9.5 or later:
INSERT INTO tbl (col1,col2)
SELECT DISTINCT * -- still can't insert the same row more than once
FROM (
VALUES
(text 'v1', text 'v2')
, ('v3','v4')
, ('v3','v4') -- you still need to fold dupes in source!
) t(c1, c2)
ON CONFLICT DO NOTHING; -- ignores rows with *any* conflict!
Further reading:
- How to use RETURNING with ON CONFLICT in PostgreSQL?
- How do I insert a row which contains a foreign key?
Documentation:
- The manual
- The commit page
- The Postgres Wiki page
Craig's reference answer for UPSERT
problems:
- How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
How to avoid duplicate record while inserting data using user defined table type in SQL Server
Please find the changes done to your script to avoid inserting duplicate record. So i considered two columns data should be unique to avoid duplication for user understanding purpose
CREATE PROCEDURE [dbo].[proc_SaveAccessRequest]
(
@TmpAR TmpAccessRequest READONLY,
@IsUAMSRequest bit,
@RequestID int OUTPUT
)
AS
BEGIN
Insert into tblRequests
(
RequesterID
,RequestType
,NextApprover
,RequestStatus
,Delegation
,CreatedOn
,CreatedBy
,[Description]
,IsSepecialRequest
,DelegationDetailID
,IsActive
,IsDeleted
,ModifiedOn
)
SELECT
RequesterID
,RequestType
,NextApprover
,RequestStatus
,Delegation
,CreatedOn
,CreatedBy
,Description
,IsSepecialRequest
,DelegationDetailID
,IsActive
,IsDeleted
,ModifiedOn
FROM @TmpAR
WHERE NOT EXISTS ( SELECT 1
FROM tblRequests i
INNER JOIN @TmpAR o
ON i.RequesterID = o.RequesterID
AND i.RequestType = o.RequestType
AND i.NextApprover = o.NextApprover)
SELECT @RequestID = SCOPE_IDENTITY()
SELECT @RequestID
END
Finding duplicate values in MySQL
Do a SELECT
with a GROUP BY
clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
Related Topics
Delete Rows with Foreign Key in Postgresql
How to Specify "Close Existing Connections" in SQL Script
Generate Insert Script for Selected Records
Update Statement Using with Clause
Finding the Data Types of a SQL Temporary Table
How to Copy Data from One Table to Another (Where Both Have Other Fields Too That Are Not in Common)
How to Correctly Handle Dates in Queries Constraints
How to Use a SQL for Loop to Insert Rows into Database
How to Determine Which Columns Are Shared Between Two Tables
Identify a 3-Column Pk Duplicate in Vba Access
How to Return a Table from a Stored Procedure
Difference Between Stored Procedures and User Defined Functions