How to Keep Only One Row of a Table, Removing Duplicate Rows

How to keep only one row of a table, removing duplicate rows?

See the following question: Deleting duplicate rows from a table.

The adapted accepted answer from there (which is my answer, so no "theft" here...):

You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.

Example query:

DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)

In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.

How to delete duplicates from one table, but keeping only one record?

I found out the exact reason of issue I faced finally.
I referenced the comment of @Malakiyasanjay.
you can find that from here How to keep only one row of a table, removing duplicate rows?

I tried like this: (and it worked for me as well but it took a lot of time to run the query for 30,000 rows)

delete from myTable
where id not in
(select min(id) as min from (select * from myTable) as x group by title)

The problem was I couldn't specify the 'myTable' table as a target table. so I used (select * from myTable) as x and figured it out.

I am sorry I can't explain more detail about that because I am not familiar with mysql query. But you should note that:

MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one.
(Please reference @forpas 's answer.)

But you have to notice this takes so long time... It might cause the time out error. I ran this query for table with about 600,000 rows but it didn't response for several days. So I conclude this idea is pretty fit to small database table.

I hope this is helpful for everyone! :)

How do I delete duplicate rows and keep the first row?

Backup your data, then...

MySQL supports JOINs in DELETE statements. If you want to keep the first of the duplicates:

DELETE a
FROM MYVIEWS a
JOIN (SELECT MIN(t.a1) AS min_a1, t.k1, t.k2, t.k3
FROM MYVIEWS t
GROUP BY t.k1, t.k2, t.k3
HAVING COUNT(*) > 1) b ON b.k1 = a.k1
AND b.k2 = a.k2
AND b.k3 = a.k3
AND b.min_a1 != a.a1

If you want to keep the last of the duplicates:

DELETE a
FROM MYVIEWS a
JOIN (SELECT MAX(t.a1) AS max_a1, t.k1, t.k2, t.k3
FROM MYVIEWS t
GROUP BY t.k1, t.k2, t.k3
HAVING COUNT(*) > 1) b ON b.k1 = a.k1
AND b.k2 = a.k2
AND b.k3 = a.k3
AND b.max_a1 != a.a1

Spark : remove duplicated rows with different values but keep only one row for distinctive row

You can drop duplicates before grouping, which gives you single record as below

df.dropDuplicates()
.withColumn("count", count("value").over(window))
.filter($"count" < 2)
.drop("count")
.show(false)

You can also specify the fields to be checked for duplicate as

df.dropDuplicates("id1", "id2", "id3", "value")
.withColumn("count", count("value").over(window))
.filter($"count" < 2)
.drop("count")
.show(false)

Output:

+---+---+---+-----+
|id1|id2|id3|value|
+---+---+---+-----+
|1 |3 |2 |tom |
|2 |1 |2 |mary |
+---+---+---+-----+

How can I delete one of two perfectly identical rows?

One option to solve your problem is to create a new table with the same schema, and then do:

INSERT INTO new_table (SELECT DISTINCT * FROM old_table)

and then just rename the tables.

You will of course need approximately the same amount of space as your table requires spare on your disk to do this!

It's not efficient, but it's incredibly simple.

Removing duplicate rows from table in Oracle

Use the rowid pseudocolumn.

DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);

Where column1, column2, and column3 make up the identifying key for each record. You might list all your columns.

Delete Duplicate rows from Mysql table and Kept only one row

You can probably do it using a JOIN in a DELETE, joining against a subselect.

More details are required to give much help, but for a rough idea:-

DELETE result 
FROM result
INNER JOIN (SELECT SomeField, COUNT(*) AS RecCount, MAX(DateAddded) AS MaxDateAdded FROM result GROUP BY SomeField) b
ON result.SomeField = b.SomeField AND a.DateAdded != b.MaxDateAdded

This is finding every occurance of SomeField with their corresponding max date added and deleting any where there isn't a match on that max date added.

I assume that you want to keep the latest record.

Note that mass deletes like this are a bit worrying, given that if you get it wrong you potentially delete all your records.

EDIT - version to go with the table you have now given. This will delete the duplicates, just leaving you with the first one that is the same (ie for Google you just get left with the row with id of 1)

DELETE foo 
FROM foo
INNER JOIN (SELECT link, title, description, MIN(id) AS MinId FROM foo GROUP BY link, title, description ) b
ON foo.link = b.link
AND foo.title = b.title
AND foo.description = b.description
AND foo.id != b.MinId

Eliminating duplicate values based on only one column of the table

This is where the window function row_number() comes in handy:

SELECT s.siteName, s.siteIP, h.date
FROM sites s INNER JOIN
(select h.*, row_number() over (partition by siteName order by date desc) as seqnum
from history h
) h
ON s.siteName = h.siteName and seqnum = 1
ORDER BY s.siteName, h.date


Related Topics



Leave a reply



Submit