Calculate Skip Value for Given Record for Sorted Paging

Calculate skip value for given record for sorted paging

This is called "forward paging" which is a concept you can use to "efficiently page" through results in a "forward" direction when using "sorted" results.

JavaScript logic included (because it works in the shell), but not hard to translate.

The concept in general:

{ "_id": 1, "a": 3 },
{ "_id": 2, "a": 3 },
{ "_id": 3, "a": 3 },
{ "_id": 4, "a": 2 },
{ "_id": 5, "a": 1 },
{ "_id": 6, "a": 0 }

Consider those "already sorted" documents ( for convienience ) as an example of results we want to "page" by "two" items per page.

In the first instance you do something like this:

var lastVal = null,
lastSeen = [];

db.collection.find().sort({ "a": -1 }).limit(2).forEach(function(doc) {
if ( lastVal != doc.a ) {
lastSeen = [];
}
lastVal = doc.a;
lastSeen.push( doc._id );
// do something useful with each document matched
});

Now those lastVal and lastSeen are something you store in something like a "session variable" than can be accessed on the next request in terms of web applications, or otherwise something similar where not.

What they should contain though are the very last value you were sorting on and the list of "unique" _id values that were seen since that value did not change. Hence:

lastVal = 3,
lastSeen = [1,2];

The point is that when the request for the "next page" comes around then you want to use those variables for something like this:

var lastVal = 3,
lastSeen = [1,2];

db.collection.find({
"_id": { "$nin": lastSeen },
"a": { "$lte": lastVal }
}).sort({ "a": -1 }).limit(2).forEach(function(doc) {
if ( lastVal != doc.a ) {
lastSeen = [];
}
lastVal = doc.a;
lastSeen.push( doc._id );
// do something useful with each document matched
});

What that does is "exclude" all values of _id that are recorded in lastSeen from the list of results, as well as make sure that all results need to be "less than or equal to" ( descending order ) the lastVal recorded for the sort field "a".

This yields the next two results in the collection:

{ "_id": 3, "a": 3 },
{ "_id": 4, "a": 2 },

But after processing our values now look like this:

lastVal = 2,
lastSeen = [4];

So now the logic follows that you don't need to exclude the other _id values seen before since you are only really looking for values of "a" than are "less than or equal to" the lastVal and since there was only "one" _id value seen at that value then only exclude that one.

This of course yields the next page on using the same code as just above:

{ "_id": 5, "a": 1 },
{ "_id": 6, "a": 0 }

That is the most effiecient way to "forward page" through results in general and is particularly useful for efficient paging of "sorted" results.

If however you want to "jump" to page 20 or similar action at any stage then this is not for you. You are stuck with the traditional .skip() and .limit() approach to be able to do this by "page number" since there is no other rational way to "calculate" this.

So it all depends on how your application is implementing "paging" and what you can live with. The .skip() and .limit() approach suffers the performance of "skipping" and can be avoided by using the approach here.

On the other hand, if you want "jump to page" then "skipping" is your only real option unless you want to build a "cache" of results. But that's another issue entirely.

When using skip and take to page data, how can I get the total record count without a separate query?

Assuming you are building a query in the selecting event, the best you could do is construct the full query, get and save the count, then take or skip it into the e.result.

And by best I mean, the easiest read code from a single query, rather than two. You'll still be running two separate evaluations on the database though. Use query analyser to see if the statements are a 'Select Count' then a 'Select take' or a dirty big select pared down by LINQ after the retrieve. I think LINQ does the former.

Calculating item offset for pagination

Use offset = (page - 1) * itemsPerPage + 1.

Linq Paging - How to incorporate total record count

I know i could run another query, but figured there may be another way of achieving this count without having to run the query twice.

No, you have to run the query.

You can do something like:

var source = from p in matches
orderby p.PROMOTION_NM descending
select p;
var count = source.Count();
var promotionInfo = source.Skip(startRow).Take(pageSize).ToList();

Be advised, however, that Skip(0) isn't free.

Dealing with paging when [partially] duplicate records exist

It can be done witn using a nested select with Row_Number and pick up the appropriate rows with Partition By, something like this:

select
<some fields>,
count(*) over.. as _$total_records
From
(
select
<some fields>,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY reference) rw
From
<some inline view>
) tbl
WHERE tbl.rw=1
ORDER BY ...
OFFSET ...
FETCH ...

Get the total number of records when doing pagination

To get the total number of records before skip/take you have to run a separate query. Getting the actual number returned would use Count(), but wouldn't result in another query if the original query was materialized.

var q = from x in base.EntityDataContext.Corporates 
select x;

var total = q.Count();
var cs = q.Skip(10).Take(10);
var numberOnSelectedPage = cs.Count();

Is there any better option to apply pagination without applying OFFSET in SQL Server?

You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).

In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.

For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.

In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.

One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.


Here is an example:

Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.

You begin with:

SELECT TOP (@numRows)
*
FROM TableName
ORDER BY Id DESC;

Note the use of ORDER BY to ensure the order is correct

In some RDBMSs you need LIMIT instead of TOP

The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:

SELECT TOP (@numRows)
*
FROM TableName
WHERE Id < @lastId
ORDER BY Id DESC;

Note the use of < not <=

In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.


The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.

SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (@lastOther, @lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).

Instead you need the following:

SELECT TOP (@numRows)
*
FROM TableName
WHERE (
(OtherColumn = @lastOther AND Id < @lastId)
OR OtherColumn < @lastOther
)
ORDER BY
OtherColumn DESC,
Id DESC;

This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.

The presence of NULLs complicates things further. You may want to query those rows separately.



Related Topics



Leave a reply



Submit