How I know my document's size inside MongoDB with the ruby driver
You can use BSON.serialize and find the length of the resulting byte buffer. See http://www.mongodb.org/display/DOCS/BSON#BSON-Ruby for an example of using BSON.serialize.
Get MongoDB object size through Ruby connector
As of Mongo's Ruby Driver 2.0 release, BSON.serialize
is removed. If you have a BSON::Document
, you can transform it to a BSON::ByteBuffer
by calling to_bson
, and then get its size by calling length
on that.
Example:
BSON::Document.new({a: 1}).to_bson.length
=> 12
Finding the size of a Mongo::Collection::View
collection.find({ foo: 'bar' }).count()
should solve your problem. There is no size method available in mongo but there is count.
Query Mongo Embedded Documents with a size
The problem with the current approach in here is that the standard MongoDB query forms do not actually "filter" the nested array documents in any way. This is essentially what you need in order to "find the duplicates" within your documents here.
For this, MongoDB provides the aggregation framework as probably the best approach to finding this. There is no direct "mongoid" style approach to the queries as those are geared towards the existing "rails" style of dealing with relational documents.
You can access the "moped" form though through the .collection
accessor on your class model:
Record.collection.aggregate([
# Find arrays two elements or more as possibles
{ "$match" => {
"$and" => [
{ "fragments" => { "$not" => { "$size" => 0 } } },
{ "fragments" => { "$not" => { "$size" => 1 } } }
]
}},
# Unwind the arrays to "de-normalize" as documents
{ "$unwind" => "$fragments" },
# Group back and get counts of the "key" values
{ "$group" => {
"_id" => { "_id" => "$_id", "source_id" => "$fragments.source_id" },
"fragments" => { "$push" => "$fragments.id" },
"count" => { "$sum" => 1 }
}},
# Match the keys found more than once
{ "$match" => { "count" => { "$gte" => 2 } } }
])
That would return you results like this:
{
"_id" : { "_id": "76561198045636214", "source_id": "source2" },
"fragments": ["76561198045636216","76561198045636217"],
"count": 2
}
That at least gives you something to work with on how to deal with the "duplicates" here
Mongoid: Query based on size of embedded document array
I nicer way would be to use the native syntax of MongoDB rather than resort to rails like methods or JavaScript evaluation as pointed to in the accepted answer of the question you link to. Especially as evaluating a JavaScript condition will be much slower.
The logical extension of $exists
for a an array with some length greater than zero is to use "dot notation" and test for the presence of the "zero index" or first element of the array:
Customer.collection.find({ "orders.0" => { "$exists" => true } })
That can seemingly be done with any index value where n-1
is equal to the value of the index for the "length" of the array you are testing for at minimum.
Worth noting that for a "zero length" array exclusion the $size
operator is also a valid alternative, when used with $not
to negate the match:
Customer.collection.find({ "orders" => { "$not" => { "$size" => 0 } } })
But this does not apply well to larger "size" tests, as you would need to specify all sizes to be excluded:
Customer.collection.find({
"$and" => [
{ "orders" => { "$not" => { "$size" => 4 } } },
{ "orders" => { "$not" => { "$size" => 3 } } },
{ "orders" => { "$not" => { "$size" => 2 } } },
{ "orders" => { "$not" => { "$size" => 1 } } },
{ "orders" => { "$not" => { "$size" => 0 } } }
]
})
So the other syntax is clearer:
Customer.collection.find({ "orders.4" => { "$exists" => true } })
Which means 5 or more members in a concise way.
Please also note that none of these conditions alone can just an index, so if you have another filtering point that can it is best to include that condition first.
Mongo / Ruby driver output specific number of documents at a time?
Mongo::Collection#find returns a Mongo::Cursor that is Enumerable. For batch processing Enumerable#each_slice is your friend and well worth adding to your toolkit.
Hope that you like this.
find_each_slice_test.rb
require 'mongo'
require 'test/unit'
class FindEachSliceTest < Test::Unit::TestCase
def setup
@samplecoll = Mongo::MongoClient.new('localhost', 27017)['sampledb']['samplecoll']
@samplecoll.remove
end
def test_find_each_slice
12345.times{|i| @samplecoll.insert( { i: i } ) }
slice__max_size = 5000
@samplecoll.find.each_slice(slice__max_size) do |slice|
puts "slice.size: #{slice.size}"
assert(slice__max_size >= slice.size)
end
end
end
ruby find_each_slice_test.rb
Run options:
# Running tests:
slice.size: 5000
slice.size: 5000
slice.size: 2345
.
Finished tests in 6.979301s, 0.1433 tests/s, 0.4298 assertions/s.
1 tests, 3 assertions, 0 failures, 0 errors, 0 skips
Ruby mongoDB and large documents
The paragraph about document growth finally solved my question. (Found by following Konrad's link.)
http://docs.mongodb.org/manual/core/data-model-operations/#data-model-document-growth
What I am now basically doing is this:
cli = MongoClient.new("localhost", MongoClient::DEFAULT_PORT)
db = cli.db("testdb")
coll = db.collection("test")
grid = Grid.new db
#store data
id = grid.put "A"*17_000_000
data = {:name => "Customer1", :data1 => "some value", :log_file => id}
coll.save data
#access data
cust = coll.find({:name => "Customer1"})
id = cust.first["log_file"]
data = grid.get id
Count operation with parameters with mongodb ruby driver
Is it possible use the count() Mongodb feature with filter parameters in some other way?
From the shell (command-line), you can do the following:
db.collection.find({ data : value}).count()
Obviously, you'll have to do something similar with Ruby, but it should be pretty straightforward.
Related Topics
Is a Global Variable Defined Inside a Sinatra Route Shared Between Requests
What Are Tainted Objects, and When Should We Untaint Them
Parsing JSON Without Quoted Keys
How to Transfer Files Using Ssh and Scp Using Ruby Calls
Breaking Ruby Module Across Several Files
Rails Scaffolding Pluralisation Is Incorrect for "Cafe"
How to Dynamically Call Accessor Methods in Ruby
How to Make the Url's in Ruby on Rails Seo Friendly Knowing a @Vendor.Name
Heroku Rejecting Push in Mature Application (Pre-Receive Hook Declined)
Rails 3: User Created Custom Forms
Rails Two-Legged Oauth Provider
Is It a Bad Idea to Reload Routes Dynamically in Rails
How to Print Information About a Net:Httprequest for Debug Purposes
Need Advice: Is This a Good Use Case for a 'Nosql' Database? If So, Which One
Ruby Mechanize Post with Header
Does Activemerchant Support Subscription Based Transaction