How to Remove Duplicates from Table Using SQL Query

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

If the "other table" does not exist yet you can create it like this

CREATE TABLE othertable LIKE originaltable

And the insert the requested row with this statement:

INSERT INTO othertable 
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1

There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).

SQL query to remove duplicates from single column based on latest date

I have tried a few partitioning sql queries and also CTE but not able to get the desired result

Using QUALIFY it could be achieved without cte:

SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY COLUMN1 ORDER BY COLUMN2 DESC) = 1

How can I remove duplicate rows?

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace

MIN(RowId)

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

How to remove duplicates from table using SQL query

It looks like all four column values are duplicated so you can do this -

select distinct emp_name, emp_address, sex, marital_status
from YourTable

However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1
from YourTable b
where b.emp_name = a.emp_name and
b.emp_address = a.emp_address and
b.sex = a.sex and
b.create_date >= a.create_date)

Removing duplicate rows from table in Oracle

Use the rowid pseudocolumn.

DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);

Where column1, column2, and column3 make up the identifying key for each record. You might list all your columns.

How to remove duplicates based on a certain column in SQL Server?

We can use a deletable CTE along with ROW_NUMBER here:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY fid ORDER BY date DESC, name) rn
FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;

The above logic will assign rn = 1 (i.e. spare) the record with the most recent date, per group of fid records. Should two records with the same fid also have the same latest date, then it spares the earlier name.

Eliminating duplicate values based on only one column of the table

This is where the window function row_number() comes in handy:

SELECT s.siteName, s.siteIP, h.date
FROM sites s INNER JOIN
(select h.*, row_number() over (partition by siteName order by date desc) as seqnum
from history h
) h
ON s.siteName = h.siteName and seqnum = 1
ORDER BY s.siteName, h.date

SQL How to remove duplicates within select query?

You mention that there are date duplicates, but it appears they're quite unique down to the precision of seconds.

Can you clarify what precision of date you start considering dates duplicate - day, hour, minute?

In any case, you'll probably want to floor your datetime field. You didn't indicate which field is preferred when removing duplicates, so this query will prefer the last name in alphabetical order.

 SELECT MAX(owner_name), 
--floored to the second
dateadd(second,datediff(second,'2000-01-01',start_date),'2000-01-01') AS StartDate
From MyTable
GROUP BY dateadd(second,datediff(second,'2000-01-01',start_date),'2000-01-01')


Related Topics



Leave a reply



Submit