Pagination with Mongodb

How to do pagination using range queries in MongoDB?

Since the collection I was paging had duplicate values I had to create a compound index on ProductName and id.

Create Compound Index

db.ProductGuideItem.ensureIndex({ ProductName:1, _id:1});

This solved my problem.

Reference: https://groups.google.com/d/msg/mongodb-user/3EZZIRJzW_A/oYH79npKZHkJ

Assuming you have these values:

{a:1, b:1}
{a:2, b:1}
{a:2, b:2}
{a:2, b:3}
{a:3, b:1}

So you do this for the range based pagination (page size of 2):

1st Page

find().sort({a:1, b:1}).limit(2)
{a:1, b:1}
{a:2, b:1}

2nd Page

find().min({a:2, b:1}).sort({a:1, b:1}).skip(1).limit(2)

{a:2, b:2}
{a:2, b:3}

3rd Page

find().min({a:2, b:3}).sort({a:1, b:1}).skip(1).limit(2)
{a:3, b:1}

Here are the docs for $min/max:
http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers

If you don't have duplicate values in your collection, you don't need to use min & max or create a compound index. You can just use $lt & $gt.

How to use MongoDB aggregation for pagination?

To calculate totals and return a subset, you need to apply grouping and skip/limit to the same dataset. For that you can utilise facets

For example to show 3rd page, 10 documents per page:

db.Order.aggregate([
{ '$match' : { "company_id" : ObjectId("54c0...") } },
{ '$sort' : { 'order_number' : -1 } },
{ '$facet' : {
metadata: [ { $count: "total" }, { $addFields: { page: NumberInt(3) } } ],
data: [ { $skip: 20 }, { $limit: 10 } ] // add projection here wish you re-shape the docs
} }
] )

It will return a single document with 2 fields:

{
"metadata" : [
{
"total" : 300,
"page" : 3
}
],
"data" : [
{
... original document ...
},
{
... another document ...
},
{
... etc up to 10 docs ...
}
]
}

Implementing pagination in mongodb

The concept you are talking about can be called "forward paging". A good reason for that is unlike using .skip() and .limit() modifiers this cannot be used to "go back" to a previous page or indeed "skip" to a specific page. At least not with a great deal of effort to store "seen" or "discovered" pages, so if that type of "links to page" paging is what you want, then you are best off sticking with the .skip() and .limit() approach, despite the performance drawbacks.

If it is a viable option to you to only "move forward", then here is the basic concept:

db.junk.find().limit(3)

{ "_id" : ObjectId("54c03f0c2f63310180151877"), "a" : 1, "b" : 1 }
{ "_id" : ObjectId("54c03f0c2f63310180151878"), "a" : 4, "b" : 4 }
{ "_id" : ObjectId("54c03f0c2f63310180151879"), "a" : 10, "b" : 10 }

Of course that's your first page with a limit of 3 items. Consider that now with code iterating the cursor:

var lastSeen = null;
var cursor = db.junk.find().limit(3);

while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if (!cursor.hasNext())
lastSeen = doc._id;
}

So that iterates the cursor and does something, and when it is true that the last item in the cursor is reached you store the lastSeen value to the present _id:

ObjectId("54c03f0c2f63310180151879")

In your subsequent iterations you just feed that _id value which you keep ( in session or whatever ) to the query:

var cursor = db.junk.find({ "_id": { "$gt": lastSeen } }).limit(3);

while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if (!cursor.hasNext())
lastSeen = doc._id;
}

{ "_id" : ObjectId("54c03f0c2f6331018015187a"), "a" : 1, "b" : 1 }
{ "_id" : ObjectId("54c03f0c2f6331018015187b"), "a" : 6, "b" : 6 }
{ "_id" : ObjectId("54c03f0c2f6331018015187c"), "a" : 7, "b" : 7 }

And the process repeats over and over until no more results can be obtained.

That's the basic process for a natural order such as _id. For something else it gets a bit more complex. Consider the following:

{ "_id": 4, "rank": 3 }
{ "_id": 8, "rank": 3 }
{ "_id": 1, "rank": 3 }
{ "_id": 3, "rank": 2 }

To split that into two pages sorted by rank then what you essentially need to know is what you have "already seen" and exclude those results. So looking at a first page:

var lastSeen = null;
var seenIds = [];
var cursor = db.junk.find().sort({ "rank": -1 }).limit(2);

while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if ( lastSeen != null && doc.rank != lastSeen )
seenIds = [];
seenIds.push(doc._id);
if (!cursor.hasNext() || lastSeen == null)
lastSeen = doc.rank;
}

{ "_id": 4, "rank": 3 }
{ "_id": 8, "rank": 3 }

On the next iteration you want to be less or equal to the lastSeen "rank" score, but also excluding those already seen documents. You do this with the $nin operator:

var cursor = db.junk.find(
{ "_id": { "$nin": seenIds }, "rank": "$lte": lastSeen }
).sort({ "rank": -1 }).limit(2);

while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if ( lastSeen != null && doc.rank != lastSeen )
seenIds = [];
seenIds.push(doc._id);
if (!cursor.hasNext() || lastSeen == null)
lastSeen = doc.rank;
}

{ "_id": 1, "rank": 3 }
{ "_id": 3, "rank": 2 }

How many "seenIds" you actually hold on to depends on how "granular" your results are where that value is likely to change. In this case you can check if the current "rank" score is not equal to the lastSeen value and discard the present seenIds content so it does not grow to much.

That's the basic concepts of "forward paging" for you to practice and learn.

What is the best way for pagination on mongodb using java

When talking about pagination in MongoDB, it is easily to write this code:

collection.find().skip(pageSize*(pageNum-1)).limit(pageSize);

Above is the native solution supported by MongoDB, but this is not efficient if there are huge documents in the collection. Suppose you have 100M documents, and you want to get the data from the middle offset(50Mth). MongoDB has to build up the full dataset and walk from the beginning to the specified offset, this will be low performance. As your offset increases, the performance keeps degrade.

The root cause is the skip() command which is not efficient and can not take big benifit from index.


Below is another solution to improve performance on large data pagination:

The typical usage scenario of pagination is that there is a table or list to show data of specified page, and also a 'Previous Page' & 'Next Page' button to load data of previous or next page.

If you got the '_id' of the last document in current page, you can use find() instead of skip(). Use _id > currentPage_LastDocument._id as one of the criteria to find next page data. Here is pseudocode:

//Page 1
collection.find().limit(pageSize);
//Get the _id of the last document in this page
last_id = ...

//Page 2
users = collection.find({'_id': {$gt: last_id}}).limit(pageSize);
//Update the last id with the _id of the last document in this page
last_id = ...

This will avoid MongoDB to walk through large data when using skip().



Related Topics



Leave a reply



Submit