Delete Duplicate Records Using Rownum in Sql

Deleting duplicate records with ROWNUM?

That just won't work. From documentation:

For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

Its value is set at the moment you run the query and can be changed, depending on how you fetch data (different ORDER BY will produce different ROWNUM value for the same row). As it is sequential, you can't set "groups" of ROWNUM values (for example, so that it goes from 1, 2, 3 for one set, then 1, 2, 3, 4, 5 for another - you'll always get 1, 2, 3, 4, 5, 6, 7, 8).

If you planned to do something like this:

SQL> create table test as
2 select e.empno, e.deptno, d.dname, e.ename
3 from emp e join dept d on e.deptno = d.deptno;

Table created.

SQL> select * from test order by deptno;

EMPNO DEPTNO DNAME ENAME
---------- ---------- -------------- ----------
7782 10 ACCOUNTING CLARK
7839 10 ACCOUNTING KING
7934 10 ACCOUNTING MILLER
7369 20 RESEARCH SMITH
7902 20 RESEARCH FORD
7566 20 RESEARCH JONES
7900 30 SALES JAMES
7844 30 SALES TURNER
7654 30 SALES MARTIN
7521 30 SALES WARD
7499 30 SALES ALLEN
7698 30 SALES BLAKE

12 rows selected.

SQL> delete from test t
2 where t.empno in (select a.empno
3 from (select t1.empno, t1.deptno, t1.dname, rownum rn
4 from test t1
5 ) a
6 where a.rn > 1
7 );

11 rows deleted.

As you can see, all rows (but one) are duplicates. Here's why:

SQL> rollback;

Rollback complete.

SQL> select a.deptno, a.empno, a.rn, a.rn1
2 from (select t1.empno, t1.deptno, t1.dname, rownum rn,
3 row_number() over (partition by t1.deptno order by null) rn1
4 from test t1
5 ) a;

DEPTNO EMPNO RN RN1
---------- ---------- ---------- ----------
10 7782 2 1
10 7839 1 2
10 7934 3 3
20 7369 5 1
20 7902 4 2
20 7566 6 3
30 7900 7 1
30 7844 8 2
30 7654 9 3
30 7521 10 4
30 7499 11 5
30 7698 12 6

12 rows selected.

See? RN (ROWNUM) has all values from 1, 2, ..., 12. RN1 (ROW_NUMBER, which allows us to set partitions) does the job correctly. So, if you used RN1 instead of RN, it would work:

SQL> delete from test t
2 where t.empno in (select a.empno
3 from (select t1.empno, t1.deptno, t1.dname, rownum rn,
4 row_number() over (partition by t1.deptno order by null) rn1
5 from test t1
6 ) a
7 where a.rn1 > 1
8 );

9 rows deleted.

SQL> select * From test;

EMPNO DEPTNO DNAME ENAME
---------- ---------- -------------- ----------
7782 10 ACCOUNTING CLARK
7369 20 RESEARCH SMITH
7900 30 SALES JAMES

SQL>

[EDIT: deleting duplicates #2]

Here's another example which shows how/what to do if you want to delete duplicates. It is based on the "ROWID technique" (although there are another ones too).

Back to the table we've been working with. Suppose that we want to keep only one distinct job per department:

SQL> select deptno, job, dname, empno, ename
2 from test
3 order by deptno, job;

DEPTNO JOB DNAME EMPNO ENAME
---------- --------- -------------- ---------- ----------
10 CLERK ACCOUNTING 7934 MILLER
10 MANAGER ACCOUNTING 7782 CLARK
10 PRESIDENT ACCOUNTING 7839 KING

20 ANALYST RESEARCH 7902 FORD
20 CLERK RESEARCH 7369 SMITH
20 MANAGER RESEARCH 7566 JONES

30 CLERK SALES 7900 JAMES
30 MANAGER SALES 7698 BLAKE
30 SALESMAN SALES 7844 TURNER -- leave
30 SALESMAN SALES 7654 MARTIN -- only
30 SALESMAN SALES 7521 WARD -- one
30 SALESMAN SALES 7499 ALLEN -- salesman

12 rows selected.
  • in department 10, there are no duplicates - 3 employees, each of them doing their own job
  • the same goes for department 20
  • however, in department 30, there are 4 SALESMEN and we want to keep only one - another ones are duplicates

It means that you have to take both columns - DEPTNO and JOB - into account when deleting rows. Let's do that:

SQL> delete from test a
2 where rowid > (select min(rowid)
3 from test b
4 where a.deptno = b.deptno --> take both DEPTNO ...
5 and a.job = b.job --> and JOB into account
6 );

3 rows deleted.

The result: departments 10 and 20 didn't change, but in department 30 now we have only one salesman, just as we wanted:

SQL> select deptno, job, dname, empno, ename
2 from test
3 order by deptno, job;

DEPTNO JOB DNAME EMPNO ENAME
---------- --------- -------------- ---------- ----------
10 CLERK ACCOUNTING 7934 MILLER
10 MANAGER ACCOUNTING 7782 CLARK
10 PRESIDENT ACCOUNTING 7839 KING
20 ANALYST RESEARCH 7902 FORD
20 CLERK RESEARCH 7369 SMITH
20 MANAGER RESEARCH 7566 JONES
30 CLERK SALES 7900 JAMES
30 MANAGER SALES 7698 BLAKE
30 SALESMAN SALES 7844 TURNER

9 rows selected.

SQL>

Delete duplicate records using rownum in sql

Use the rowid

DELETE FROM table_name a
WHERE EXISTS( SELECT 1
FROM table_name b
WHERE a.id = b.id
AND a.name = b.name
AND a.rowid > b.rowid )

Of course, you could do a.rowid < b.rowid as well. The rowid is just the physical address of the row so it doesn't matter whether you delete the row that has the larger or the smaller address.

Your expected results, though, don't make sense.

Expected Result :

ROWNUM ID NAME
---------- ---------- ----------
4 1 leo_1
5 2 leo_2
6 3 leo_3

The rownum of a result set is always assigned at query time. That means that a particular row may appear with different rownum values in different queries (or when the same query is run multiple times). rownum is always sequential so you can never have a rownum of 4 in a result set without also having rownum values of 1, 2, and 3 in the same result set. Whichever duplicate row you delete, your result will be

Expected Result :

    ROWNUM         ID NAME
---------- ---------- ----------
1 1 leo_1
2 2 leo_2
3 3 leo_3

But the rownum values are arbitrary. It would be just as valid for Oracle to return

Expected Result :

    ROWNUM         ID NAME
---------- ---------- ----------
1 2 leo_2
2 3 leo_3
3 1 leo_1

Need to delete duplicate records from the table using row_number()

WITH cte AS
(
SELECT ROW_NUMBER() OVER(PARTITION by ID ORDER BY name) AS Row
FROM test
)

DELETE FROM cte
WHERE Row > 1

Deleting duplicate record in SQL Server

You need to reference the CTE in the delete statement...

WITH a as
(
SELECT Firstname,ROW_NUMBER() OVER(PARTITION by Firstname, empID ORDER BY Firstname)
AS duplicateRecCount
FROM dbo.tblEmployee
)
--Now Delete Duplicate Records
DELETE FROM a
WHERE duplicateRecCount > 1

Delete duplicate records without using ROW_NUMBER() function

As the motivation for this question seems to be academic interest rather than practical use...

The table has no primary key but the undocumented pseudo column %%physloc%% can provide a substitute.

DELETE T1
FROM YourTable T1 WITH(TABLOCKX)
WHERE CAST(T1.%%physloc%% AS BIGINT)
NOT IN (SELECT MAX(CAST(%%physloc%% AS BIGINT))
FROM YourTable
GROUP BY Name, Salary)

In reality you should never use the above and just use row_number as it is more efficient and documented.

(Data Explorer Demo)

How to delete the duplicates from the table using row_number() in psql?

How about this:

PostgreSQL DELETE statement with USING clause

Ref: PostgreSQL Docs



Related Topics



Leave a reply



Submit