Delete Duplicate Records Using Rownum in Sql

Deleting duplicate records with ROWNUM?

That just won't work. From documentation:

For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

Its value is set at the moment you run the query and can be changed, depending on how you fetch data (different ORDER BY will produce different ROWNUM value for the same row). As it is sequential, you can't set "groups" of ROWNUM values (for example, so that it goes from 1, 2, 3 for one set, then 1, 2, 3, 4, 5 for another - you'll always get 1, 2, 3, 4, 5, 6, 7, 8).

If you planned to do something like this:

SQL> create table test as
  2    select e.empno, e.deptno, d.dname, e.ename
  3    from emp e join dept d on e.deptno = d.deptno;

Table created.

SQL> select * from test order by deptno;

     EMPNO     DEPTNO DNAME          ENAME
---------- ---------- -------------- ----------
      7782         10 ACCOUNTING     CLARK
      7839         10 ACCOUNTING     KING
      7934         10 ACCOUNTING     MILLER
      7369         20 RESEARCH       SMITH
      7902         20 RESEARCH       FORD
      7566         20 RESEARCH       JONES
      7900         30 SALES          JAMES
      7844         30 SALES          TURNER
      7654         30 SALES          MARTIN
      7521         30 SALES          WARD
      7499         30 SALES          ALLEN
      7698         30 SALES          BLAKE

12 rows selected.

SQL> delete from test t
  2  where t.empno in (select a.empno
  3                    from (select t1.empno, t1.deptno, t1.dname, rownum rn
  4                          from test t1
  5                         ) a
  6                    where a.rn > 1
  7                   );

11 rows deleted.

As you can see, all rows (but one) are duplicates. Here's why:

SQL> rollback;

Rollback complete.

SQL> select a.deptno, a.empno, a.rn, a.rn1
  2  from (select t1.empno, t1.deptno, t1.dname, rownum rn,
  3               row_number() over (partition by t1.deptno order by null) rn1
  4        from test t1
  5       ) a;

    DEPTNO      EMPNO         RN        RN1
---------- ---------- ---------- ----------
        10       7782          2          1
        10       7839          1          2
        10       7934          3          3
        20       7369          5          1
        20       7902          4          2
        20       7566          6          3
        30       7900          7          1
        30       7844          8          2
        30       7654          9          3
        30       7521         10          4
        30       7499         11          5
        30       7698         12          6

12 rows selected.

See? RN (ROWNUM) has all values from 1, 2, ..., 12. RN1 (ROW_NUMBER, which allows us to set partitions) does the job correctly. So, if you used RN1 instead of RN, it would work:

SQL> delete from test t
  2  where t.empno in (select a.empno
  3                    from (select t1.empno, t1.deptno, t1.dname, rownum rn,
  4                          row_number() over (partition by t1.deptno order by null) rn1
  5                          from test t1
  6                         ) a
  7                    where a.rn1 > 1
  8                   );

9 rows deleted.

SQL> select * From test;

     EMPNO     DEPTNO DNAME          ENAME
---------- ---------- -------------- ----------
      7782         10 ACCOUNTING     CLARK
      7369         20 RESEARCH       SMITH
      7900         30 SALES          JAMES

SQL>

[EDIT: deleting duplicates #2]

Here's another example which shows how/what to do if you want to delete duplicates. It is based on the "ROWID technique" (although there are another ones too).

Back to the table we've been working with. Suppose that we want to keep only one distinct job per department:

SQL> select deptno, job, dname, empno, ename
  2  from test
  3  order by deptno, job;

    DEPTNO JOB       DNAME               EMPNO ENAME
---------- --------- -------------- ---------- ----------
        10 CLERK     ACCOUNTING           7934 MILLER
        10 MANAGER   ACCOUNTING           7782 CLARK
        10 PRESIDENT ACCOUNTING           7839 KING

        20 ANALYST   RESEARCH             7902 FORD
        20 CLERK     RESEARCH             7369 SMITH
        20 MANAGER   RESEARCH             7566 JONES

        30 CLERK     SALES                7900 JAMES
        30 MANAGER   SALES                7698 BLAKE
        30 SALESMAN  SALES                7844 TURNER    -- leave
        30 SALESMAN  SALES                7654 MARTIN    -- only
        30 SALESMAN  SALES                7521 WARD      -- one
        30 SALESMAN  SALES                7499 ALLEN     -- salesman

12 rows selected.

in department 10, there are no duplicates - 3 employees, each of them doing their own job
the same goes for department 20
however, in department 30, there are 4 SALESMEN and we want to keep only one - another ones are duplicates

It means that you have to take both columns - DEPTNO and JOB - into account when deleting rows. Let's do that:

SQL> delete from test a
  2  where rowid > (select min(rowid)
  3                 from test b
  4                 where a.deptno = b.deptno  --> take both DEPTNO ...
  5                   and a.job = b.job        --> and JOB into account
  6                );

3 rows deleted.

The result: departments 10 and 20 didn't change, but in department 30 now we have only one salesman, just as we wanted:

SQL> select deptno, job, dname, empno, ename
  2  from test
  3  order by deptno, job;

    DEPTNO JOB       DNAME               EMPNO ENAME
---------- --------- -------------- ---------- ----------
        10 CLERK     ACCOUNTING           7934 MILLER
        10 MANAGER   ACCOUNTING           7782 CLARK
        10 PRESIDENT ACCOUNTING           7839 KING
        20 ANALYST   RESEARCH             7902 FORD
        20 CLERK     RESEARCH             7369 SMITH
        20 MANAGER   RESEARCH             7566 JONES
        30 CLERK     SALES                7900 JAMES
        30 MANAGER   SALES                7698 BLAKE
        30 SALESMAN  SALES                7844 TURNER

9 rows selected.

SQL>

Delete duplicate records using rownum in sql

Use the rowid

DELETE FROM table_name a
 WHERE EXISTS( SELECT 1
                 FROM table_name b
                WHERE a.id = b.id
                  AND a.name = b.name
                  AND a.rowid > b.rowid )

Of course, you could do a.rowid < b.rowid as well. The rowid is just the physical address of the row so it doesn't matter whether you delete the row that has the larger or the smaller address.

Your expected results, though, don't make sense.

Expected Result :

        ROWNUM         ID NAME
    ---------- ---------- ----------
             4          1 leo_1
             5          2 leo_2
             6          3 leo_3

The rownum of a result set is always assigned at query time. That means that a particular row may appear with different rownum values in different queries (or when the same query is run multiple times). rownum is always sequential so you can never have a rownum of 4 in a result set without also having rownum values of 1, 2, and 3 in the same result set. Whichever duplicate row you delete, your result will be

Expected Result :

    ROWNUM         ID NAME
---------- ---------- ----------
         1          1 leo_1
         2          2 leo_2
         3          3 leo_3

But the rownum values are arbitrary. It would be just as valid for Oracle to return

Expected Result :

    ROWNUM         ID NAME
---------- ---------- ----------
         1          2 leo_2
         2          3 leo_3
         3          1 leo_1

Need to delete duplicate records from the table using row_number()

WITH cte AS
(
    SELECT ROW_NUMBER() OVER(PARTITION by ID ORDER BY name) AS Row
    FROM test
)

DELETE FROM cte
WHERE Row > 1

Deleting duplicate record in SQL Server

You need to reference the CTE in the delete statement...

WITH a as
(
SELECT Firstname,ROW_NUMBER() OVER(PARTITION by Firstname, empID ORDER BY Firstname) 
AS duplicateRecCount
FROM dbo.tblEmployee
)
--Now Delete Duplicate Records
DELETE FROM a
WHERE duplicateRecCount > 1

Delete duplicate records without using ROW_NUMBER() function

As the motivation for this question seems to be academic interest rather than practical use...

The table has no primary key but the undocumented pseudo column %%physloc%% can provide a substitute.

DELETE T1
FROM YourTable T1 WITH(TABLOCKX)
WHERE CAST(T1.%%physloc%% AS BIGINT)
NOT IN (SELECT MAX(CAST(%%physloc%% AS BIGINT))
        FROM YourTable 
        GROUP BY Name, Salary)

In reality you should never use the above and just use row_number as it is more efficient and documented.

(Data Explorer Demo)

How to delete the duplicates from the table using row_number() in psql?

How about this:

PostgreSQL DELETE statement with USING clause

Ref: PostgreSQL Docs