SQL Query - Delete duplicates if more than 3 dups?
with cte as (
select row_number() over (partition by dupcol1, dupcol2 order by ID) as rn
from table)
delete from cte
where rn > 2; -- or >3 etc
The query is manufacturing a 'row number' for each record, grouped by the (dupcol1, dupcol2) and ordered by ID. In effect this row number counts 'duplicates' that have the same dupcol1 and dupcol2 and assigns then the number 1, 2, 3.. N, order by ID. If you want to keep just 2 'duplicates', then you need to delete those that were assigned the numbers 3,4,.. N
and that is the part taken care of by the DELLETE.. WHERE rn > 2;
Using this method you can change the ORDER BY
to suit your preferred order (eg.ORDER BY ID DESC
), so that the LATEST
has rn=1
, then the next to latest is rn=2 and so on. The rest stays the same, the DELETE
will remove only the oldest ones as they have the highest row numbers.
Unlike this closely related question, as the condition becomes more complex, using CTEs and row_number() becomes simpler. Performance may be problematic still if no proper access index exists.
Delete duplicate records from a table only if the count is greater than 3
If I understand your question clearly, why not restrict the deletion to those rows with the desired values using subquery. Something like this:
DELETE FROM test5 x
WHERE x.ROWID >
ANY( select y.ROWID
FROM test5 y
WHERE X.AA = Y.AA
AND
X.BB = Y.BB
AND
X.CC = Y.CC)
AND (X.AA, X.BB, X.CC) IN (
SELECT AA, BB, CC FROM TEST5
GROUP BY AA, BB, CC
HAVING COUNT(AA) > 3);
This should delete duplicates of the first two tuples only.
Removing duplicate rows (based on values from multiple columns) from SQL table
Sample SQL FIDDLE
1) Use CTE to get max ship code value record based on ARDivisionNo, CustomerNo
for each Customers
WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
FROM t
)
Select * from cte WHERE [rn] = 1
2) To Delete the record use Delete query instead of Select and change Where Clause to rn > 1. Sample SQL FIDDLE
WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn]
FROM t
)
Delete from cte WHERE [rn] > 1;
select * from t;
How can I remove duplicate rows?
Assuming no nulls, you GROUP BY
the unique columns, and SELECT
the MIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
In case you have a GUID instead of an integer, you can replace
MIN(RowId)
with
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
How to delete duplicate records in SQL?
You can delete duplicates using i.e. ROW_NUMBER()
:
with duplicates as
(
select
*
,ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, age ORDER BY FirstName) AS number
from yourTable
)
delete
from duplicates
where number > 1
Each row where number
is bigger than 1 is a duplicate.
MySQL delete duplicate records but keep latest
Imagine your table test
contains the following data:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2 and 6.
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
Update
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
How to delete duplicate rows without unique identifier
I like @erwin-brandstetter 's solution, but wanted to show a solution with the USING
keyword:
DELETE FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
If you want to review the records before deleting them, then simply replace DELETE
with SELECT *
and USING
with a comma ,
, i.e.
SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;
Update: I tested some of the different solutions here for speed. If you don't expect many duplicates, then this solution performs much better than the ones that have a NOT IN (...)
clause as those generate a lot of rows in the subquery.
If you rewrite the query to use IN (...)
then it performs similarly to the solution presented here, but the SQL code becomes much less concise.
Update 2: If you have NULL
values in one of the key columns (which you really shouldn't IMO), then you can use COALESCE()
in the condition for that column, e.g.
AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')
T-SQL: Deleting all duplicate rows but keeping one
You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:
WITH cte AS (
SELECT[foo], [bar],
row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
FROM TABLE
)
DELETE cte WHERE [rn] > 1
Play around with it and see what you get.
(Edit: In an attempt to be helpful, someone edited the ORDER BY
clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc
)
Eliminating duplicate values based on only one column of the table
This is where the window function row_number()
comes in handy:
SELECT s.siteName, s.siteIP, h.date
FROM sites s INNER JOIN
(select h.*, row_number() over (partition by siteName order by date desc) as seqnum
from history h
) h
ON s.siteName = h.siteName and seqnum = 1
ORDER BY s.siteName, h.date
Related Topics
SQL Server Split Comma Separated Values into Columns
Oracle/Sql: Wm_Concat & Order By
Using Insert into with 'Select' to Supply Some Values But Not Others (Access 2010)
Months Between Two Dates in SQL Server with Starting and End Date of Each of Them in SQL Server
Transpose a Row into Columns with MySQL Without Using Unions
Is It Possible for Me to Include a Sub Report in a Tablix Row That Is Grouped by an Id
Computed Column Should Result to String
Insert Identity Column Value into Table from Another Table
How to Specify SQL Sort Order in SQL Query
Insert or Select Strategy to Always Return a Row
How to Create Foreign Keys Across Databases
How to Prevent Ssis from Writing Column Names to the Flat File Output
Join/Pivot Items with Eav Table
Count Values for Every Column in a Table
Delete Duplicate Record from Same Table in MySQL
Adding a Column to All User Tables in T-Sql