Efficient Way to Implement Paging

efficient way to implement paging

Trying to give you a brief answer to your doubt, if you execute the skip(n).take(m) methods on linq (with SQL 2005 / 2008 as database server) your query will be using the Select ROW_NUMBER() Over ... statement, with is somehow direct paging in the SQL engine.

Giving you an example, I have a db table called mtcity and I wrote the following query (work as well with linq to entities):

using (DataClasses1DataContext c = new DataClasses1DataContext())
{
var query = (from MtCity2 c1 in c.MtCity2s
select c1).Skip(3).Take(3);
//Doing something with the query.
}

The resulting query will be:

SELECT [t1].[CodCity], 
[t1].[CodCountry],
[t1].[CodRegion],
[t1].[Name],
[t1].[Code]
FROM (
SELECT ROW_NUMBER() OVER (
ORDER BY [t0].[CodCity],
[t0].[CodCountry],
[t0].[CodRegion],
[t0].[Name],
[t0].[Code]) AS [ROW_NUMBER],
[t0].[CodCity],
[t0].[CodCountry],
[t0].[CodRegion],
[t0].[Name],
[t0].[Code]
FROM [dbo].[MtCity] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]

Which is a windowed data access (pretty cool, btw cuz will be returning data since the very begining and will access the table as long as the conditions are met). This will be very similar to:

With CityEntities As 
(
Select ROW_NUMBER() Over (Order By CodCity) As Row,
CodCity //here is only accessed by the Index as CodCity is the primary
From dbo.mtcity
)
Select [t0].[CodCity],
[t0].[CodCountry],
[t0].[CodRegion],
[t0].[Name],
[t0].[Code]
From CityEntities c
Inner Join dbo.MtCity t0 on c.CodCity = t0.CodCity
Where c.Row Between @p0 + 1 AND @p0 + @p1
Order By c.Row Asc

With the exception that, this second query will be executed faster than the linq result because it will be using exclusively the index to create the data access window; this means, if you need some filtering, the filtering should be (or must be) in the Entity listing (where the row is created) and some indexes should be created as well to keep up the good performance.

Now, whats better?

If you have pretty much solid workflow in your logic, implementing the proper SQL way will be complicated. In that case LINQ will be the solution.

If you can lower that part of the logic directly to SQL (in a stored procedure), it will be even better because you can implement the second query I showed you (using indexes) and allow SQL to generate and store the Execution Plan of the query (improving performance).

MySQL Data - Best way to implement paging?

From the MySQL documentation:

The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).

With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):

SELECT * FROM tbl LIMIT 5,10;  # Retrieve rows 6-15

To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:

SELECT * FROM tbl LIMIT 95,18446744073709551615;

With one argument, the value specifies the number of rows to return from the beginning of the result set:

SELECT * FROM tbl LIMIT 5;     # Retrieve first 5 rows

In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.

best way to get large data from database with pagination in php?

The best practice and best ways in pagination is using the limit and offset and also optimizing the output.

Also Pear::Pager can help with the output, and to limit the database traffic to only what you need, you can use a wrapper provided in the examples that come with it.

you may take a look at this article too. it's providing data about Optimized Pagination using MySQL, making the difference between counting the total amount of rows, and pagination.

Efficient paging in Web Application

I'm not convinced either of your links demonstrate efficent paging

link 1 - the LINQ example
In order for this to be effificent I would expect to see something in the form

var myDataSource = data.Select(x => x.Parameter = "input").Skip(1).Take(10);

The principle is that the skip and take methods just retrieve the page of data that you want

link 2 - the SQL Example

Again not convinced - I would expect to see something in the SQL that uses ROW_OVER() or some other evidence of the SQL only bringing back the page of data that you want. This link gives an example of using ROW_OVER with SQL Server 2005 (2008 might have improved - i don't know TBH - someone else might and i would be interested).

Generally

I you have ASP.Net 3.5 or higher I would use the LINQ example - it's a lot more straight forward than trying to do it in SQL

This link gives a better example using LINQ with the Take and Skip operators to do efficient paging with a ListView. It also addresses the issue of having to get a total count of your records in order to display the number of pages which is a common requirement.

Also - this SO question gives some very good examples of efficient paging so I do recommend you read that.

LINQ Caution

As the commenter below pointed out - you need to ensure that skip and take are executing on the DB not the entire dataset being realised, brought back and the Skip and Take executing on the client. Profiler is your friend here. Also is worth knowing which operators realise the query and fire it off to the database.

var myDataSource = data.Select(x => x.Parameter = "input").Skip(1).Take(10);

above probably OK

var myDataSource = data.Select(x => x.Parameter = "input")
.ToList().Skip(1).Take(10);

Ooops no good. The ToList() method will cause the SQL to be fired off to the database before Skip and Take has done it's work.

Efficiently paging large data sets with LINQ

The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.

It is when you call query.Count() that the first will be executed

SELECT COUNT(*) FROM Table WHERE Something = something

On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).

If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.


Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.

Is there any better option to apply pagination without applying OFFSET in SQL Server?

You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).

In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.

For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.

In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.

One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.


Here is an example:

Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.

You begin with:

SELECT TOP (@numRows)
*
FROM TableName
ORDER BY Id DESC;

Note the use of ORDER BY to ensure the order is correct

In some RDBMSs you need LIMIT instead of TOP

The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:

SELECT TOP (@numRows)
*
FROM TableName
WHERE Id < @lastId
ORDER BY Id DESC;

Note the use of < not <=

In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.


The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.

SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (@lastOther, @lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).

Instead you need the following:

SELECT TOP (@numRows)
*
FROM TableName
WHERE (
(OtherColumn = @lastOther AND Id < @lastId)
OR OtherColumn < @lastOther
)
ORDER BY
OtherColumn DESC,
Id DESC;

This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.

The presence of NULLs complicates things further. You may want to query those rows separately.

Paging with PagedList, is it efficient?

Naturally paging will require knowledge of the total result count in order for the logic to determine how many pages there are etc. However instead of bringing down all the results just build your query to the Database to return the paged amount (e.g 30) and as well as the count of all the results.

For example, if you were using Entity Framework, or LINQ2SQL you could do something like this

IQueryable allResults = MyRepository.RetrieveAll();

var resultGroup = allResults.OrderByDescending(r => r.DatePosted)
.Skip(60)
.Take(30)
.GroupBy(p => new {Total = allResults.Count()})
.First();

var results = new ResultObject
{
ResultCount = resultGroup.Key.Total,
Results = resultGrouping.Select(r => r)
};

Because we haven't done a .ToList() on our result set until we have finalised what we want, we haven't brought the results into memory. This is done when we call the .First() on our result set.

Finally our Object that we end up with (ResultObject) can be used to then do the paging later on. As we have the count, we already know what page we are on (3 as we skipped 60, with 30 per page) and we have the results to display.

Further Reading and Information

How To: Page through Query Results

Server Side Paging with Entity Frame



Related Topics



Leave a reply



Submit