How to keep only one row of a table, removing duplicate rows?
See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
How to delete duplicates from one table, but keeping only one record?
I found out the exact reason of issue I faced finally.
I referenced the comment of @Malakiyasanjay.
you can find that from here How to keep only one row of a table, removing duplicate rows?
I tried like this: (and it worked for me as well but it took a lot of time to run the query for 30,000 rows)
delete from myTable
where id not in
(select min(id) as min from (select * from myTable) as x group by title)
The problem was I couldn't specify the 'myTable' table as a target table. so I used (select * from myTable) as x
and figured it out.
I am sorry I can't explain more detail about that because I am not familiar with mysql query. But you should note that:
MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one.
(Please reference @forpas 's answer.)
But you have to notice this takes so long time... It might cause the time out error. I ran this query for table with about 600,000 rows but it didn't response for several days. So I conclude this idea is pretty fit to small database table.
I hope this is helpful for everyone! :)
How do I delete duplicate rows and keep the first row?
Backup your data, then...
MySQL supports JOINs in DELETE statements. If you want to keep the first of the duplicates:
DELETE a
FROM MYVIEWS a
JOIN (SELECT MIN(t.a1) AS min_a1, t.k1, t.k2, t.k3
FROM MYVIEWS t
GROUP BY t.k1, t.k2, t.k3
HAVING COUNT(*) > 1) b ON b.k1 = a.k1
AND b.k2 = a.k2
AND b.k3 = a.k3
AND b.min_a1 != a.a1
If you want to keep the last of the duplicates:
DELETE a
FROM MYVIEWS a
JOIN (SELECT MAX(t.a1) AS max_a1, t.k1, t.k2, t.k3
FROM MYVIEWS t
GROUP BY t.k1, t.k2, t.k3
HAVING COUNT(*) > 1) b ON b.k1 = a.k1
AND b.k2 = a.k2
AND b.k3 = a.k3
AND b.max_a1 != a.a1
Spark : remove duplicated rows with different values but keep only one row for distinctive row
You can drop duplicates before grouping, which gives you single record as below
df.dropDuplicates()
.withColumn("count", count("value").over(window))
.filter($"count" < 2)
.drop("count")
.show(false)
You can also specify the fields to be checked for duplicate as
df.dropDuplicates("id1", "id2", "id3", "value")
.withColumn("count", count("value").over(window))
.filter($"count" < 2)
.drop("count")
.show(false)
Output:
+---+---+---+-----+
|id1|id2|id3|value|
+---+---+---+-----+
|1 |3 |2 |tom |
|2 |1 |2 |mary |
+---+---+---+-----+
How can I delete one of two perfectly identical rows?
One option to solve your problem is to create a new table with the same schema, and then do:
INSERT INTO new_table (SELECT DISTINCT * FROM old_table)
and then just rename the tables.
You will of course need approximately the same amount of space as your table requires spare on your disk to do this!
It's not efficient, but it's incredibly simple.
Removing duplicate rows from table in Oracle
Use the rowid
pseudocolumn.
DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);
Where column1
, column2
, and column3
make up the identifying key for each record. You might list all your columns.
Delete Duplicate rows from Mysql table and Kept only one row
You can probably do it using a JOIN in a DELETE, joining against a subselect.
More details are required to give much help, but for a rough idea:-
DELETE result
FROM result
INNER JOIN (SELECT SomeField, COUNT(*) AS RecCount, MAX(DateAddded) AS MaxDateAdded FROM result GROUP BY SomeField) b
ON result.SomeField = b.SomeField AND a.DateAdded != b.MaxDateAdded
This is finding every occurance of SomeField with their corresponding max date added and deleting any where there isn't a match on that max date added.
I assume that you want to keep the latest record.
Note that mass deletes like this are a bit worrying, given that if you get it wrong you potentially delete all your records.
EDIT - version to go with the table you have now given. This will delete the duplicates, just leaving you with the first one that is the same (ie for Google you just get left with the row with id of 1)
DELETE foo
FROM foo
INNER JOIN (SELECT link, title, description, MIN(id) AS MinId FROM foo GROUP BY link, title, description ) b
ON foo.link = b.link
AND foo.title = b.title
AND foo.description = b.description
AND foo.id != b.MinId
Eliminating duplicate values based on only one column of the table
This is where the window function row_number()
comes in handy:
SELECT s.siteName, s.siteIP, h.date
FROM sites s INNER JOIN
(select h.*, row_number() over (partition by siteName order by date desc) as seqnum
from history h
) h
ON s.siteName = h.siteName and seqnum = 1
ORDER BY s.siteName, h.date
Related Topics
SQL - Combining Multiple Like Queries
What Are Indexes and How to Use Them to Optimize Queries in My Database
Split Varchar into Separate Columns in Oracle
SQL Statement Help - Select Latest Order for Each Customer
How to Get All Article Pages Under a Wikipedia Category and Its Sub-Categories
The Version of SQL Server in Use Does Not Support Datatype Datetime2
Sum Columns with Null Values in Oracle
How to Calculate the Last Day of the Month in SQL
Why Use the Between Operator When We Can Do Without It
Select Distinct Values from 1 Column
How to Clear Oracle Execution Plan Cache for Benchmarking
How to Find All Rows with a Null Value in Any Column Using Postgresql
How to Insert Data into Table Using Stored Procedures in Postgresql
MySQL Question - How to Handle Multiple Types of Users - One Table or Multiple
How to Create a Conditional Where Clause
Removing Duplicates from SQL Join