SQL Rank() Versus Row_Number()

SQL RANK() versus ROW_NUMBER()

ROW_NUMBER : Returns a unique number for each row starting with 1. For rows that have duplicate values,numbers are arbitarily assigned.

Rank : Assigns a unique number for each row starting with 1,except for rows that have duplicate values,in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

Top vs Rank/Row Number functions - Which performs higher?

In the case of your example the TOP is somewhat more efficient.

The execution plan for TOP is below

Sample Image

The TOP N sort with N=1 just needs to keep track of the row with the lowest birthDate that it sees.

For the row_number query it recognises that the row number is always ascending and does itself add a TOP 1 to the plan but it doesn't combine the separated TOP and SORT into a TOP N Sort - so it does a full sort of all 5 rows.

Sample Image

In the case that an index supplies rows in the desired order without the need for a sort there won't be much in it. The row_number query will have an extra couple of operators that are fairly inexpensive anyway.

WHY use ranking functions in SQL Server when it has TOP

Ranking functions in general are more powerful than TOP.

For the cases where both would work consider that TOP is a fairly ancient proprietary syntax and not standard SQL. It was in the product a long time before window functions were added. If portable SQL is a concern you should not use TOP.

Though you might not use ranking functions either. As another (standard SQL) alternative is

SELECT d.ID, d.fName, d.lName, d.birthDate
FROM fData d
ORDER BY d.birthDate
OFFSET 0 ROWS
FETCH NEXT 1 ROW ONLY

which gives the same plan as TOP 1

Sql server ROW_NUMBER() & Rank() function detail....how it works

  • OVER Clause (Transact-SQL)

  • Ranking Functions (Transact-SQL)

  • ROW_NUMBER (Transact-SQL)

  • RANK (Transact-SQL)

Window Function- Dense_Rank and Row_Number difference

In your query, the difference between using dense_rank() and row_number() is that the former allows top ties, while the latter does not.

So if two (or more) records have the same, earliest, transaction_settled_at for a given signup_id, then condition dense_rank() ... = 1 will keep them both, while row_number() will select an undefined record out of the two.

If there no risk of ties, both functions will in your context produce the same resulting dataset.

To reduce the possibility of ties, you can also add additional sorting criterias to the order by clause of the window function:

dense_rank() over (
partition by signup_id
order by transaction_settled_at, some_other_column desc, some_more_column
)

Row_number skip values

I suggest using the DENSE_RANK() with the columns you have hidden (--*,):

SELECT
row_num AS id,
include_appt,
CASE WHEN include_appt is not null
THEN ROW_NUMBER() OVER(ORDER BY (SELECT 0))
+ 1
- DENSE_RANK() OVER(
PARTITION BY /*some hidden columns*/
ORDER BY/*some hidden columns*/)
ELSE NULL
END AS row_num2
FROM C
ORDER BY row_num

Then the result will be:
enter image description here

SQL Server RANK : get first row from a set of rows

Typically, the fastest method (with the right index) is a correlated subquery:

select 1 as rownum, salary, name, endmonth, name 
from employee e
where e.endmonth = (select max(e2.endmonth)
from employee e2
);

The index you want is on employee(endmonth).

If you know there will be one row, then order by with fetch first (or your databases equivalent) is the best approach:

select 1 as rownum, salary, name, endmonth, name 
from employee e
order by e.endmonth desc
fetch first one row only;

If your database supports with ties, then you can use that. For instance, in SQL Server:

select top (1) with ties 1 as rownum, salary, name, endmonth, name 
from employee e
order by e.endmonth desc;


Related Topics



Leave a reply



Submit