Row_Number() Over Not Fast Enough with Large Result Set, Any Good Solution

ROW_NUMBER() OVER Not Fast Enough With Large Result Set, any good solution?

Years back, while working with Sql Server 2000, which did not have this function, we had the same issue.

We found this method, which at first look seems like the performance can be bad, but blew us out the water.

Try this out

DECLARE @Table TABLE(
ID INT PRIMARY KEY
)

--insert some values, as many as required.

DECLARE @I INT
SET @I = 0
WHILE @I < 100000
BEGIN
INSERT INTO @Table SELECT @I
SET @I = @I + 1
END

DECLARE @Start INT,
@Count INT

SELECT @Start = 10001,
@Count = 50

SELECT *
FROM (
SELECT TOP (@Count)
*
FROM (
SELECT TOP (@Start + @Count)
*
FROM @Table
ORDER BY ID ASC
) TopAsc
ORDER BY ID DESC
) TopDesc
ORDER BY ID

How can I speed up row_number in Oracle?

ROW_NUMBER is quite inefficient in Oracle.

See the article in my blog for performance details:

  • Oracle: ROW_NUMBER vs ROWNUM

For your specific query, I'd recommend you to replace it with ROWNUM and make sure that the index is used:

SELECT  *
FROM (
SELECT /*+ INDEX_ASC(t index_on_column) NOPARALLEL_INDEX(t index_on_column) */
t.*, ROWNUM AS rn
FROM table t
ORDER BY
column
)
WHERE rn >= :start
AND rownum <= :end - :start + 1

This query will use COUNT STOPKEY

Also either make sure you column is not nullable, or add WHERE column IS NOT NULL condition.

Otherwise the index cannot be used to retrieve all values.

Note that you cannot use ROWNUM BETWEEN :start and :end without a subquery.

ROWNUM is always assigned last and checked last, that's way ROWNUM's always come in order without gaps.

If you use ROWNUM BETWEEN 10 and 20, the first row that satisifies all other conditions will become a candidate for returning, temporarily assigned with ROWNUM = 1 and fail the test of ROWNUM BETWEEN 10 AND 20.

Then the next row will be a candidate, assigned with ROWNUM = 1 and fail, etc., so, finally, no rows will be returned at all.

This should be worked around by putting ROWNUM's into the subquery.

Efficient way of getting @@rowcount from a query using row_number

Check out the COUNT(*) aggregate when used with OVER(PARTITON BY..), like so:

    SELECT
ROW_NUMBER() OVER(ORDER BY object_id, column_id) as RowNum
, COUNT(*) OVER(PARTITION BY 1) as TotalRows
, *
FROM master.sys.columns

This is IMHO the best way to do it without having to do two queries.

SQL performance: WHERE vs WHERE(ROW_NUMBER)

The 2nd answer is your best choice. It takes into account the fact that you could have holes in your ID column. I'd rewrite it as a CTE though instead of a subquery...

;WITH MyCTE AS
(SELECT *,
ROW_NUMBER() OVER (ORDER BY ID) AS row
FROM Table)
SELECT *
FROM MyCTE
WHERE row >= @start
AND row <= @end

What are the differences between the older row_number() and the newer OFFSET + FETCH based pagination in SQL Server?

Using ROW_NUMBER() works fine - it's just more work than necessary; you need to write a "skeleton" CTE around your actual query, add the ROW_NUMBER() column to your output set, and then filter on that.

Using the new OFFSET / FETCH is simpler - and yes, it's also better for performance, as these two links can show you:

  • New T-SQL features in SQL Server 2012
  • Comparing performance for different SQL Server paging

So overall: if you're using SQL Server 2012 - then you should definitely use OFFSET/FETCH rather than ROW_NUMBER() for paging

How do I translate a query that uses ROW_NUMBER() into linq?

You can write query as beow

var index=1;
var pageIndex=1;
var pageSize = 10;
data.Select(x => new
{
RowIndex = index++,
Sno = x.Sno,
Name = x.Name,
Age = x.Age
}).OrderBy(x => x.Name)
.Skip(pageSize * (pageIndex - 1)).Take(pageSize).ToList();

Optimizing Select SQL request with millions of entries

I haven't tried it, because I don't have an MySQL DB at hand, but this query seems much simpler:

select * 
from billing
where billing_id in (select min(billing_id)
from billing
group by subscription_id)
and billing_value not like 'not_ok%';

The inner select get the minimum billing_id for all subscriptions. The outer gets the rest of the billing record.

If performance is an issue, I'd add the billing_id field in the third index, so you get an index with (subscription_id,billing_id). This will help for the inner query.



Related Topics



Leave a reply



Submit