Multiple Limit Condition in Mongodb

Multiple limit condition in mongodb

Generally what you are describing is a relatively common question around the MongoDB community which we could describe as the "top n results problem". This is when given some input that is likely sorted in some way, how to get the top n results without relying on arbitrary index values in the data.

MongoDB has the $first operator which is available to the aggregation framework which deals with the "top 1" part of the problem, as this actually takes the "first" item found on a grouping boundary, such as your "type". But getting more than "one" result of course gets a little more involved. There are some JIRA issues on this about modifying other operators to deal with n results or "restrict" or "slice". Notably SERVER-6074. But the problem can be handled in a few ways.

Popular implementations of the rails Active Record pattern for MongoDB storage are Mongoid and Mongo Mapper, both allow access to the "native" mongodb collection functions via a .collection accessor. This is what you basically need to be able to use native methods such as .aggregate() which supports more functionality than general Active Record aggregation.

Here is an aggregation approach with mongoid, though the general code does not alter once you have access to the native collection object:

require "mongoid"
require "pp";

Mongoid.configure.connect_to("test");

class Item
  include Mongoid::Document
  store_in collection: "item"

  field :type, type: String
  field :pos, type: String
end

Item.collection.drop

Item.collection.insert( :type => "A", :pos => "First" )
Item.collection.insert( :type => "A", :pos => "Second"  )
Item.collection.insert( :type => "A", :pos => "Third" )
Item.collection.insert( :type => "A", :pos => "Forth" )
Item.collection.insert( :type => "B", :pos => "First" )
Item.collection.insert( :type => "B", :pos => "Second" )
Item.collection.insert( :type => "B", :pos => "Third" )
Item.collection.insert( :type => "B", :pos => "Forth" )

res = Item.collection.aggregate([
  { "$group" => {
      "_id" => "$type",
      "docs" => {
        "$push" => {
          "pos" => "$pos", "type" => "$type"
        }
      },
      "one" => {
        "$first" => {
          "pos" => "$pos", "type" => "$type"
        }
      }
  }},
  { "$unwind" =>  "$docs" },
  { "$project" => {
    "docs" => {
      "pos" => "$docs.pos",
      "type" => "$docs.type",
      "seen" => {
        "$eq" => [ "$one", "$docs" ]
      },
    },
    "one" => 1
  }},
  { "$match" => {
    "docs.seen" => false
  }},
  { "$group" => {
    "_id" => "$_id",
    "one" => { "$first" => "$one" },
    "two" => {
      "$first" => {
        "pos" => "$docs.pos",
        "type" => "$docs.type"
      }
    },
    "splitter" => {
      "$first" => {
        "$literal" => ["one","two"]
      }
    }
  }},
  { "$unwind" => "$splitter" },
  { "$project" => {
    "_id" => 0,
    "type" => {
      "$cond" => [
        { "$eq" => [ "$splitter", "one" ] },
        "$one.type",
        "$two.type"
      ]
    },
    "pos" => {
      "$cond" => [
        { "$eq" => [ "$splitter", "one" ] },
        "$one.pos",
        "$two.pos"
      ]
    }
  }}
])

pp res

The naming in the documents is actually not used by the code, and titles in the data shown for "First", "Second" etc, are really just there to illustrate that you are indeed getting the "top 2" documents from the listing as a result.

So the approach here is essentially to create a "stack" of the documents "grouped" by your key, such as "type". The very first thing here is to take the "first" document from that stack using the $first operator.

The subsequent steps match the "seen" elements from the stack and filter them, then you take the "next" document off of the stack again using the $first operator. The final steps in there are really justx to return the documents to the original form as found in the input, which is generally what is expected from such a query.

So the result is of course, just the top 2 documents for each type:

{ "type"=>"A", "pos"=>"First" }
{ "type"=>"A", "pos"=>"Second" }
{ "type"=>"B", "pos"=>"First" }
{ "type"=>"B", "pos"=>"Second" }

There was a longer discussion and version of this as well as other solutions in this recent answer:

Mongodb aggregation $group, restrict length of array

Essentially the same thing despite the title and that case was looking to match up to 10 top entries or greater. There is some pipeline generation code there as well for dealing with larger matches as well as some alternate approaches that may be considered depending on your data.

Does MongoDB's $in clause has any max limit in number of arguments

There is no limit to the number of arguments in the $in clause itself, however, the total query size is limited to 16MB as a query is just a BSON document. Depending on the type used for ids (see the BSON specification), you may start running into problems when your ids length is in the order of a few millions.

MongoDB limit group by results

You can try,

$match pname condition
$sort by pname ascending order (optional)
$group by vname and push root object in items and make array
$project to show required fields and get 4 objects using $slice

db.collection.aggregate([
  { $match: { pname: "xy" } },
  { $sort: { pname: 1 } },
  {
    $group: {
      _id: "$vname",
      items: { $push: "$$ROOT" }
    }
  },
  {
    $project: {
      _id: 0,
      vname: "$_id",
      items: { $slice: ["$items", 4] }
    }
  }
])

Playground

If you want all objects in root then you can add below pipelines after above pipelines,

$unwind deconstruct items array to object
$replaceRoot to replace items object in root

  { $unwind: "$items" },
  { $replaceRoot: { newRoot: "$items" } }

Playground

How does the limit() option work in mongodb?

The first 50 documents of the result set will be returned.

If you do not sort the documents (or if the order is not well-defined, such as sorting by a field with values that occur multiple times in the result set), the order may change from one execution to the next.

Will it auto-sort them(maybe by their creation date) or not?

No.

Will the query return the same documents every time it is called?

The query may produce the same results for a while and then start producing different results if, for example, another document is inserted into the collection.

Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?

Depends on the query. If an index is used, only the needed documents will be read from the storage engine. If a sort stage is used in the query execution, all documents will be read from storage, sorted, then the required number will be returned and the rest discarded.

Mongodb aggregation $group followed by $limit for pagination

I have solved the problem without need of maintaining another collection or even without $group traversing whole collection, hence posting my own answer.

As others have pointed:

$group doesn't retain order, hence early sorting is not of much help.
$group doesn't do any optimization, even if there is a following $limit, i.e., runs $group on entire collection.

My usecase has following unique features, which helped me to solve it:

There will be maximum of 10 records per each student (minimum of 1).

I am not very particular on page size. The front-end capable of handling varying page sizes.
The following is the aggregation command I have used.

db.classtest.aggregate(
[
    {$sort: {name: 1}},
    {$limit: 5 * 10},
    {$group: {_id: '$name',
        total: {$sum: '$marks'}}},
    {$sort: {_id: 1}}
])

Explaining the above.

if $sort immediately precedes $limit, the framework optimizes the amount of data to be sent to next stage. Refer here
To get a minimum of 5 records (page size), I need to pass at least 5 (page size) * 10 (max records per student) = 50 records to the $group stage. With this, the size of final result may be anywhere between 0 and 50.
If the result is less than 5, then there is no further pagination required.
If the result size is greater than 5, there may be chance that last student record is not completely processed (i.e., not grouped all the records of student), hence I discard the last record from the result.

Then name in last record (among retained results) is used as $match criteria in subsequent page request as shown below.

db.classtest.aggregate(
[
    {$match: {name: {$gt: lastRecordName}}}
    {$sort: {name: 1}},
    {$limit: 5 * 10},
    {$group: {_id: '$name',
        total: {$sum: '$marks'}}},
    {$sort: {_id: 1}}
])

In above, the framework will still optimize $match, $sort and $limit together as single operation, which I have confirmed through explain plan.

Multiple Limit Condition in Mongodb