Findout Duplicate Rows in a Table While Inserting

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Finding duplicate rows while Inserting data into Oracle table

I would add a join to the priority table and add an analytic function to select the preferred row in case of duplicates. The query would look like this:

Select device_id, cg_id, firmware_id, best_Status, dmc_id 
From (Select a.device_id, a.cg_id, a.firmware_id, a.best_Status, a.dmc_id,
rank() Over (Partition By a.device_id, a.firmware_id
Order By b.priority) As rnk
From dm_djr_bulkjob a
Join priority_table b on b.best_status = a.best_status
Where a.oltp_updated Between '27-JUL-17' And '28-JUL-17')
Where rnk = 1;

The rank() function assigns a rank number to each row such that one row per unique (device_id, firmware_id) combination would have rnk = 1: the row with the lowest priority.

How can I check for duplicates before inserting into a table when inserting by select

INSERT INTO table1 
SELECT t2.col1,
t2.col2
FROM table2 t2
LEFT JOIN table1 t1
ON t2.col1 = t1.col1
AND t2.col2 = t1.col2
WHERE t1.col1 IS NULL

Alternative using except

INSERT INTO @table2 
SELECT col1,
col2
FROM table1
EXCEPT
SELECT t1.col1,
t1.col2
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2

Alternative using Not Exists

INSERT INTO table2 
SELECT col1,col2
FROM table1 t1
WHERE
NOT EXISTS( SELECT 1
FROM table2 t2
WHERE t1.col1 = t2.col1
AND t1.col2 = t2.col2)

What happens with duplicates when inserting multiple rows?

The INSERT will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY, UNIQUE, CHECK or EXCLUDE constraint) - which you did not mention in your question. But that's what you are probably worried about.

Assuming a UNIQUE or PK constraint on (col1,col2), you are dealing with a textbook UPSERT situation. Many related questions and answers to find here.

Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.

Without concurrent writes

I.e.: No other transactions will try to write to the same table at the same time.

  • Exclude rows that are already in the table with WHERE NOT EXISTS ... or any other applicable technique:

  • Select rows which are not present in other table

  • And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join WHERE NOT EXISTS ...

One technique to deal with both at once would be EXCEPT:

INSERT INTO tbl (col1, col2)
VALUES
(text 'v1', text 'v2') -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4') -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;

EXCEPT without the key word ALL folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use EXCEPT ALL (or one of the other techniques). See:

  • Using EXCEPT clause in PostgreSQL

Generally, if the target table is big, WHERE NOT EXISTS in combination with DISTINCT on the source will probably be faster:

INSERT INTO tbl (col1, col2)
SELECT *
FROM (
SELECT DISTINCT *
FROM (
VALUES
(text 'v1', text'v2')
, ('v3', 'v4')
, ('v3', 'v4') -- dupes in source
) t(c1, c2)
) t
WHERE NOT EXISTS (
SELECT FROM tbl
WHERE col1 = t.c1 AND col2 = t.c2
);

If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.

Related:

  • Select rows which are not present in other table

With concurrent writes

Use the Postgres UPSERT implementation INSERT ... ON CONFLICT ... in Postgres 9.5 or later:

INSERT INTO tbl (col1,col2)
SELECT DISTINCT * -- still can't insert the same row more than once
FROM (
VALUES
(text 'v1', text 'v2')
, ('v3','v4')
, ('v3','v4') -- you still need to fold dupes in source!
) t(c1, c2)
ON CONFLICT DO NOTHING; -- ignores rows with *any* conflict!

Further reading:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?
  • How do I insert a row which contains a foreign key?

Documentation:

  • The manual
  • The commit page
  • The Postgres Wiki page

Craig's reference answer for UPSERT problems:

  • How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?

How to avoid duplicate record while inserting data using user defined table type in SQL Server

Please find the changes done to your script to avoid inserting duplicate record. So i considered two columns data should be unique to avoid duplication for user understanding purpose

 CREATE PROCEDURE [dbo].[proc_SaveAccessRequest] 
(
@TmpAR TmpAccessRequest READONLY,
@IsUAMSRequest bit,
@RequestID int OUTPUT
)
AS
BEGIN

Insert into tblRequests
(
RequesterID
,RequestType
,NextApprover
,RequestStatus
,Delegation
,CreatedOn
,CreatedBy
,[Description]
,IsSepecialRequest
,DelegationDetailID
,IsActive
,IsDeleted
,ModifiedOn
)
SELECT
RequesterID
,RequestType
,NextApprover
,RequestStatus
,Delegation
,CreatedOn
,CreatedBy
,Description
,IsSepecialRequest
,DelegationDetailID
,IsActive
,IsDeleted
,ModifiedOn
FROM @TmpAR
WHERE NOT EXISTS ( SELECT 1
FROM tblRequests i
INNER JOIN @TmpAR o
ON i.RequesterID = o.RequesterID
AND i.RequestType = o.RequestType
AND i.NextApprover = o.NextApprover)

SELECT @RequestID = SCOPE_IDENTITY()

SELECT @RequestID
END

Finding duplicate values in MySQL

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.



Related Topics



Leave a reply



Submit