Deleting duplicate records with ROWNUM?
That just won't work. From documentation:
For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.
Its value is set at the moment you run the query and can be changed, depending on how you fetch data (different ORDER BY
will produce different ROWNUM
value for the same row). As it is sequential, you can't set "groups" of ROWNUM
values (for example, so that it goes from 1, 2, 3 for one set, then 1, 2, 3, 4, 5 for another - you'll always get 1, 2, 3, 4, 5, 6, 7, 8).
If you planned to do something like this:
SQL> create table test as
2 select e.empno, e.deptno, d.dname, e.ename
3 from emp e join dept d on e.deptno = d.deptno;
Table created.
SQL> select * from test order by deptno;
EMPNO DEPTNO DNAME ENAME
---------- ---------- -------------- ----------
7782 10 ACCOUNTING CLARK
7839 10 ACCOUNTING KING
7934 10 ACCOUNTING MILLER
7369 20 RESEARCH SMITH
7902 20 RESEARCH FORD
7566 20 RESEARCH JONES
7900 30 SALES JAMES
7844 30 SALES TURNER
7654 30 SALES MARTIN
7521 30 SALES WARD
7499 30 SALES ALLEN
7698 30 SALES BLAKE
12 rows selected.
SQL> delete from test t
2 where t.empno in (select a.empno
3 from (select t1.empno, t1.deptno, t1.dname, rownum rn
4 from test t1
5 ) a
6 where a.rn > 1
7 );
11 rows deleted.
As you can see, all rows (but one) are duplicates. Here's why:
SQL> rollback;
Rollback complete.
SQL> select a.deptno, a.empno, a.rn, a.rn1
2 from (select t1.empno, t1.deptno, t1.dname, rownum rn,
3 row_number() over (partition by t1.deptno order by null) rn1
4 from test t1
5 ) a;
DEPTNO EMPNO RN RN1
---------- ---------- ---------- ----------
10 7782 2 1
10 7839 1 2
10 7934 3 3
20 7369 5 1
20 7902 4 2
20 7566 6 3
30 7900 7 1
30 7844 8 2
30 7654 9 3
30 7521 10 4
30 7499 11 5
30 7698 12 6
12 rows selected.
See? RN (ROWNUM
) has all values from 1, 2, ..., 12. RN1 (ROW_NUMBER
, which allows us to set partitions) does the job correctly. So, if you used RN1 instead of RN, it would work:
SQL> delete from test t
2 where t.empno in (select a.empno
3 from (select t1.empno, t1.deptno, t1.dname, rownum rn,
4 row_number() over (partition by t1.deptno order by null) rn1
5 from test t1
6 ) a
7 where a.rn1 > 1
8 );
9 rows deleted.
SQL> select * From test;
EMPNO DEPTNO DNAME ENAME
---------- ---------- -------------- ----------
7782 10 ACCOUNTING CLARK
7369 20 RESEARCH SMITH
7900 30 SALES JAMES
SQL>
[EDIT: deleting duplicates #2]
Here's another example which shows how/what to do if you want to delete duplicates. It is based on the "ROWID technique" (although there are another ones too).
Back to the table we've been working with. Suppose that we want to keep only one distinct job per department:
SQL> select deptno, job, dname, empno, ename
2 from test
3 order by deptno, job;
DEPTNO JOB DNAME EMPNO ENAME
---------- --------- -------------- ---------- ----------
10 CLERK ACCOUNTING 7934 MILLER
10 MANAGER ACCOUNTING 7782 CLARK
10 PRESIDENT ACCOUNTING 7839 KING
20 ANALYST RESEARCH 7902 FORD
20 CLERK RESEARCH 7369 SMITH
20 MANAGER RESEARCH 7566 JONES
30 CLERK SALES 7900 JAMES
30 MANAGER SALES 7698 BLAKE
30 SALESMAN SALES 7844 TURNER -- leave
30 SALESMAN SALES 7654 MARTIN -- only
30 SALESMAN SALES 7521 WARD -- one
30 SALESMAN SALES 7499 ALLEN -- salesman
12 rows selected.
- in department 10, there are no duplicates - 3 employees, each of them doing their own job
- the same goes for department 20
- however, in department 30, there are 4 SALESMEN and we want to keep only one - another ones are duplicates
It means that you have to take both columns - DEPTNO and JOB - into account when deleting rows. Let's do that:
SQL> delete from test a
2 where rowid > (select min(rowid)
3 from test b
4 where a.deptno = b.deptno --> take both DEPTNO ...
5 and a.job = b.job --> and JOB into account
6 );
3 rows deleted.
The result: departments 10 and 20 didn't change, but in department 30 now we have only one salesman, just as we wanted:
SQL> select deptno, job, dname, empno, ename
2 from test
3 order by deptno, job;
DEPTNO JOB DNAME EMPNO ENAME
---------- --------- -------------- ---------- ----------
10 CLERK ACCOUNTING 7934 MILLER
10 MANAGER ACCOUNTING 7782 CLARK
10 PRESIDENT ACCOUNTING 7839 KING
20 ANALYST RESEARCH 7902 FORD
20 CLERK RESEARCH 7369 SMITH
20 MANAGER RESEARCH 7566 JONES
30 CLERK SALES 7900 JAMES
30 MANAGER SALES 7698 BLAKE
30 SALESMAN SALES 7844 TURNER
9 rows selected.
SQL>
Delete duplicate records using rownum in sql
Use the rowid
DELETE FROM table_name a
WHERE EXISTS( SELECT 1
FROM table_name b
WHERE a.id = b.id
AND a.name = b.name
AND a.rowid > b.rowid )
Of course, you could do a.rowid < b.rowid
as well. The rowid
is just the physical address of the row so it doesn't matter whether you delete the row that has the larger or the smaller address.
Your expected results, though, don't make sense.
Expected Result :
ROWNUM ID NAME
---------- ---------- ----------
4 1 leo_1
5 2 leo_2
6 3 leo_3
The rownum
of a result set is always assigned at query time. That means that a particular row may appear with different rownum
values in different queries (or when the same query is run multiple times). rownum
is always sequential so you can never have a rownum
of 4 in a result set without also having rownum
values of 1, 2, and 3 in the same result set. Whichever duplicate row you delete, your result will be
Expected Result :
ROWNUM ID NAME
---------- ---------- ----------
1 1 leo_1
2 2 leo_2
3 3 leo_3
But the rownum
values are arbitrary. It would be just as valid for Oracle to return
Expected Result :
ROWNUM ID NAME
---------- ---------- ----------
1 2 leo_2
2 3 leo_3
3 1 leo_1
Need to delete duplicate records from the table using row_number()
WITH cte AS
(
SELECT ROW_NUMBER() OVER(PARTITION by ID ORDER BY name) AS Row
FROM test
)
DELETE FROM cte
WHERE Row > 1
Deleting duplicate record in SQL Server
You need to reference the CTE in the delete statement...
WITH a as
(
SELECT Firstname,ROW_NUMBER() OVER(PARTITION by Firstname, empID ORDER BY Firstname)
AS duplicateRecCount
FROM dbo.tblEmployee
)
--Now Delete Duplicate Records
DELETE FROM a
WHERE duplicateRecCount > 1
Delete duplicate records without using ROW_NUMBER() function
As the motivation for this question seems to be academic interest rather than practical use...
The table has no primary key but the undocumented pseudo column %%physloc%%
can provide a substitute.
DELETE T1
FROM YourTable T1 WITH(TABLOCKX)
WHERE CAST(T1.%%physloc%% AS BIGINT)
NOT IN (SELECT MAX(CAST(%%physloc%% AS BIGINT))
FROM YourTable
GROUP BY Name, Salary)
In reality you should never use the above and just use row_number
as it is more efficient and documented.
(Data Explorer Demo)
How to delete the duplicates from the table using row_number() in psql?
How about this:
PostgreSQL DELETE statement with USING clause
Ref: PostgreSQL Docs
Related Topics
How to Access Latest Row for Each Individual Identifier
Have Pl/Sql Outputs in Real Time
Extract Email Address from String Using Tsql
Sql 2008 Vs 2012 Error: Incorrect Syntax Near The Keyword 'Compute'
Creating Trigger That Runs on Two Tables
Oracle Locking with Select...For Update Of
Order by Column1 If Column1 Is Not Null, Otherwise Order by Column2
Sql - Select Max() and Accompanying Field
Update X Set Y = Null Takes a Long Time
Sql Server Freetext Match - How to Sort by Relevance
How to Delete Duplicate Records in Sql
Writing a Recursive SQL Query on a Self-Referencing Table
How to Open Multiple .Sql Files in Only One Ssms Instance
Oracle Analytic Functions - Resetting a Windowing Clause
How to Do a Count(Distinct) Using Window Functions with a Frame in SQL Server