How I Know My Document's Size Inside Mongodb with the Ruby Driver

How I know my document's size inside MongoDB with the ruby driver

You can use BSON.serialize and find the length of the resulting byte buffer. See http://www.mongodb.org/display/DOCS/BSON#BSON-Ruby for an example of using BSON.serialize.

Get MongoDB object size through Ruby connector

As of Mongo's Ruby Driver 2.0 release, BSON.serialize is removed. If you have a BSON::Document, you can transform it to a BSON::ByteBuffer by calling to_bson, and then get its size by calling length on that.

Example:

BSON::Document.new({a: 1}).to_bson.length
=> 12

Finding the size of a Mongo::Collection::View

collection.find({ foo: 'bar' }).count()

should solve your problem. There is no size method available in mongo but there is count.

Query Mongo Embedded Documents with a size

The problem with the current approach in here is that the standard MongoDB query forms do not actually "filter" the nested array documents in any way. This is essentially what you need in order to "find the duplicates" within your documents here.

For this, MongoDB provides the aggregation framework as probably the best approach to finding this. There is no direct "mongoid" style approach to the queries as those are geared towards the existing "rails" style of dealing with relational documents.

You can access the "moped" form though through the .collection accessor on your class model:

Record.collection.aggregate([

    # Find arrays two elements or more as possibles
    { "$match" => {
        "$and" => [
            { "fragments" => { "$not" => { "$size" => 0 } } },
            { "fragments" => { "$not" => { "$size" => 1 } } }
        ]
    }},

    # Unwind the arrays to "de-normalize" as documents
    { "$unwind" => "$fragments" },

    # Group back and get counts of the "key" values
    { "$group" => {
        "_id" => { "_id" => "$_id", "source_id" => "$fragments.source_id" },
        "fragments" => { "$push" => "$fragments.id" },
        "count" => { "$sum" => 1 }
    }},

    # Match the keys found more than once
    { "$match" => { "count" => { "$gte" => 2 } } }
])

That would return you results like this:

{
    "_id" : { "_id": "76561198045636214", "source_id": "source2" },
    "fragments": ["76561198045636216","76561198045636217"],
    "count": 2
}

That at least gives you something to work with on how to deal with the "duplicates" here

Mongoid: Query based on size of embedded document array

I nicer way would be to use the native syntax of MongoDB rather than resort to rails like methods or JavaScript evaluation as pointed to in the accepted answer of the question you link to. Especially as evaluating a JavaScript condition will be much slower.

The logical extension of $exists for a an array with some length greater than zero is to use "dot notation" and test for the presence of the "zero index" or first element of the array:

Customer.collection.find({ "orders.0" => { "$exists" => true } })

That can seemingly be done with any index value where n-1 is equal to the value of the index for the "length" of the array you are testing for at minimum.

Worth noting that for a "zero length" array exclusion the $size operator is also a valid alternative, when used with $not to negate the match:

Customer.collection.find({ "orders" => { "$not" => { "$size" => 0 } } })

But this does not apply well to larger "size" tests, as you would need to specify all sizes to be excluded:

Customer.collection.find({ 
    "$and" => [ 
        { "orders" => { "$not" => { "$size" => 4 } } }, 
        { "orders" => { "$not" => { "$size" => 3 } } },
        { "orders" => { "$not" => { "$size" => 2 } } },
        { "orders" => { "$not" => { "$size" => 1 } } },
        { "orders" => { "$not" => { "$size" => 0 } } }
    ]
})

So the other syntax is clearer:

Customer.collection.find({ "orders.4" => { "$exists" => true } })

Which means 5 or more members in a concise way.

Please also note that none of these conditions alone can just an index, so if you have another filtering point that can it is best to include that condition first.

Mongo / Ruby driver output specific number of documents at a time?

Mongo::Collection#find returns a Mongo::Cursor that is Enumerable. For batch processing Enumerable#each_slice is your friend and well worth adding to your toolkit.

Hope that you like this.

find_each_slice_test.rb

require 'mongo'
require 'test/unit'

class FindEachSliceTest < Test::Unit::TestCase
  def setup
    @samplecoll = Mongo::MongoClient.new('localhost', 27017)['sampledb']['samplecoll']
    @samplecoll.remove
  end

  def test_find_each_slice
    12345.times{|i| @samplecoll.insert( { i: i } ) }
    slice__max_size = 5000
    @samplecoll.find.each_slice(slice__max_size) do |slice|
      puts "slice.size: #{slice.size}"
      assert(slice__max_size >= slice.size)
    end
  end
end

ruby find_each_slice_test.rb

Run options: 

# Running tests:

slice.size: 5000
slice.size: 5000
slice.size: 2345
.

Finished tests in 6.979301s, 0.1433 tests/s, 0.4298 assertions/s.

1 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Ruby mongoDB and large documents

The paragraph about document growth finally solved my question. (Found by following Konrad's link.)

http://docs.mongodb.org/manual/core/data-model-operations/#data-model-document-growth

What I am now basically doing is this:

cli = MongoClient.new("localhost", MongoClient::DEFAULT_PORT)
db = cli.db("testdb")
coll = db.collection("test")
grid = Grid.new db

#store data
id = grid.put "A"*17_000_000
data = {:name => "Customer1", :data1 => "some value", :log_file => id}
coll.save data

#access data
cust = coll.find({:name => "Customer1"})
id = cust.first["log_file"]
data = grid.get id

Count operation with parameters with mongodb ruby driver

Is it possible use the count() Mongodb feature with filter parameters in some other way?

From the shell (command-line), you can do the following:

db.collection.find({ data : value}).count()

Obviously, you'll have to do something similar with Ruby, but it should be pretty straightforward.

How I Know My Document's Size Inside Mongodb with the Ruby Driver