How to Delete Duplicate Rows Without Unique Identifier

How to delete duplicate rows without unique identifier

I like @erwin-brandstetter 's solution, but wanted to show a solution with the USING keyword:

DELETE   FROM table_with_dups T1
USING table_with_dups T2
WHERE T1.ctid < T2.ctid -- delete the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;

If you want to review the records before deleting them, then simply replace DELETE with SELECT * and USING with a comma ,, i.e.

SELECT * FROM table_with_dups T1
, table_with_dups T2
WHERE T1.ctid < T2.ctid -- select the "older" ones
AND T1.name = T2.name -- list columns that define duplicates
AND T1.address = T2.address
AND T1.zipcode = T2.zipcode;

Update: I tested some of the different solutions here for speed. If you don't expect many duplicates, then this solution performs much better than the ones that have a NOT IN (...) clause as those generate a lot of rows in the subquery.

If you rewrite the query to use IN (...) then it performs similarly to the solution presented here, but the SQL code becomes much less concise.

Update 2: If you have NULL values in one of the key columns (which you really shouldn't IMO), then you can use COALESCE() in the condition for that column, e.g.

  AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')

deleting duplicate row with no unique identifier

Here is a query that will remove duplicates and leave exactly one copy of each unique row. It will work with SQL Server 2005 or higher:

WITH Dups AS
(
SELECT tickId, timestamp, price,
ROW_NUMBER() OVER(PARTITION BY tickid, timestamp ORDER BY (SELECT 0)) AS rn
FROM stockData
)
DELETE FROM Dups WHERE rn > 1

SQL Server : delete duplicate rows without Unique ID

Use CTE with row_number to delete the duplicates

;with cte as
(
select *,row_number() over(order by pkID) RN
FROM yourtable
where pkID = 44
)
delete from cte where RN>1

Note: In order by you can mention the in which order you want to delete the duplicates

Delete duplicate records from a Postgresql table without a primary key?

Copy distinct data to work table fk_payment1_copy. The simplest way to do that is to use into

SELECT max(id),settlement_ref_no ... 
INTO fk_payment1_copy
from fk_payment1
GROUP BY settlement_ref_no ...

delete all rows from fk_payment1

delete from fk_payment1

and copy data from fk_payment1_copy table to fk_payment1

insert into fk_payment1
select id,settlement_ref_no ...
from fk_payment1_copy

Deleting duplicates on column without primary keys or unique constraints

You can use the ctid system column to differentiate the rows:

DELETE FROM your_table t1
USING your_table t2
WHERE t1 = t2
AND t1.ctid > t2.ctid;

How to remove duplicates in postgres (no unique id)

Each table in Postgres has a few hidden system columns. One of them (ctid) is unique by definition and can be used in cases when a primary key is missing.

DELETE FROM tablename a
USING tablename b
WHERE a.ctid < b.ctid
AND a.user_id = b.user_id
AND a.time_id = b.time_id;

The problem is due to lack of primary key. Using hidden columns should not be a systematic method (see comments below). Once you delete duplicates you should create a primary key on (user_id, time_id) or create a new unique column for this purpose.

Delete duplicate rows from table with no unique key

If you can afford to rewrite the whole table, this is probably the simplest approach:

WITH Deleted AS (
DELETE FROM discogs.releases_labels
RETURNING *
)
INSERT INTO discogs.releases_labels
SELECT DISTINCT * FROM Deleted

If you need to specifically target the duplicated records, you can make use of the internal ctid field, which uniquely identifies a row:

DELETE FROM discogs.releases_labels
WHERE ctid NOT IN (
SELECT MIN(ctid)
FROM discogs.releases_labels
GROUP BY label, release_id, catno
)

Be very careful with ctid; it changes over time. But you can rely on it staying the same within the scope of a single statement.



Related Topics



Leave a reply



Submit