Mongodb: Benefit of Using Objectid VS a String Containing an Id

MongoDb: Benefit of using ObjectID vs a string containing an Id?

The biggest reason is that ObjectIDs are 12 bytes, whereas an equivalent string is 24 bytes. Over a large enough collection, those 12 bytes saved per ID really add up! Those IDs also mean fewer bytes transferred over the wire when reading or writing the document, as well.

Additionally, some ODMs expect ObjectIDs for external document references, and may be confused by string versions of the ID. I am not familiar enough with PHP ODMs to say if this might affect you specifically, though.

Regarding the API stuff, though, you should probably be doing normalization of the data before sending it to the client anyhow, because since Mongo doesn't enforce a schema, you can have literally any sort of data in a given field, so you might have some documents that have string IDs, and others that have BSON IDs, and your API would happily send them both through to the client, but one or the other might cause breakage. In this particular case, you should use BSON ObjectIDs in your documents, and then should cast them to strings in your API output.

mgo - bson.ObjectId vs string id

As was already mentioned in the comments, storing the ObjectId as a hex string would double the space needed for it and in case you want to extract one of its values, you'd first need to construct an ObjectId from that string.

But you have a misconception. There is absolutely no need to use an ObjectId for the mandatory _id field. Quite often, I advice against that. Here is why.

Take the simple example of a book, relations and some other considerations set aside for simplicty:

{
  _id: ObjectId("56b0d36c23da2af0363abe37"),
  isbn: "978-3453056657",
  title: "Neuromancer",
  author: "William Gibson",
  language: "German"
}

Now, what use would have the ObjectId here? Actually none. It would be an index with hardly any use, since you would never search your book databases by an artificial key like that. It holds no semantic value. It would be a unique ID for an object which already has a globally unique ID – the ISBN.

So we simplify our book document like this:

{
  _id: "978-3453056657",
  title: "Neuromancer",
  author: "William Gibson",
  language: "German"
}

We have reduced the size of the document, make use of a preexisting globally unique ID and do not have a basically unused index.

Back to your basic question wether you loose something by not using ObjectIds: Quite often, not using the ObjectId is the better choice. But if you use it, use the binary form.

Is it better to save id of a document in another document as ObjectId or String

Regardless of performance, you should store the "referential key" in the same format as the _id field that you are referring too. That means that if your referred document is:

{ _id: ObjectID("68746287..."), value: 'foo' }

then you'd refer to it as:

{ _id: ObjectID(…parent document id…), subDoc: ObjectID("68746287...")

If the document that you're pointing to has a string as an ID, then it'd look like:

{ _id: "derick-address-1", value: 'foo' }

then you'd refer to it as:

{ _id: ObjectID(…parent document id…), subDoc: "derick-address-1" }

Besides that, because you're talking about persons and addresses, it might make more sense to not have them in two documents altogether, but instead embed the document:

{ _id: ObjectID(…parent document id…),
  'name' : 'Derick',
  'addresses' : [
     { 'type' : 'Home', 'street' : 'Victoria Road' },
     { 'type' : 'Work', 'street' : 'King William Street' },
  ]
}

What's the difference between _id: ObjectID and String?

ObjectIDs are a 12-byte BSON object ID (it requires 24 to display as hex, since you need two hex characters to encode a byte value). The string is a full 24 bytes. See the ObjectID specification for specifics.

That said, you can use anything you want for your _id field, but it's recommended that you stick to a consistent scheme - either use all strings (as well as strings in cases that you reference other documents from foreign keys) or use all ObjectIDs. It is conventional to use ObjectIDs, but as long as you have a fully consistent scheme, you shouldn't have significant problems with it.

Difference between storing an ObjectId and its string form, in MongoDB

I convert to string in code to compare and I ensure that anything that looks like an ObjectId is actually used as a ObjectId.

It is good to note that between the ObjectId (http://docs.mongodb.org/manual/reference/object-id/) and it's hex representation there is in fact 12 bytes of difference, the ObjectId being 12 bytes and it's hex representation being 24.

Not only is it about storage efficiency but also about indexes; not just because they are smaller but also since the ObjectId can be used in a special manner to ensure that only parts of the index are loaded; the parts that are used. This becomes most noticeable when inserting, where only the latest part of that index needs to be loaded in to ensure uniqueness. You cannot guarantee such behaviour with its hex representation.

I would strongly recommend you do not use the ObjectId's hex representation. If you want to "make your life easier" you would be better off creating a different _id which is smaller but somehow just as unique and index friendly.

Can MongoDB's _id fields be compared?

You can compare ObjectIDs with the .equals(). See the documentation.

ObjectId is a hexadecimal string which represents a 12-byte number.

a 4-byte timestamp value, representing the ObjectId's creation,
measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value

Since the time stamp is the most significant part of an ObjectId, yes you can.
Selecting the most significant four bytes of the ObjectId as the time stamp.

Also see ObjectId.getTimestamp() documentation.

Is it ok to use Mongo's Object ID as its unique identifier? If so, how can I convert it to a string and look it up by string?

You can construct a new ObjectId using the string. This example uses the MongoDB console:

db.users.find({ _id: ObjectId("4cdfb11e1f3c000000007822") })

I can't tell from your question which language driver you are using (if any at all), but most drivers also support this functionality.

You should NOT convert the ObjectId in the database to a string, and then compare it to another string. If you'd do this, MongoDB cannot use the _id index and it'll have to scan all the documents, resulting in poor query performance.

Is it necessary to store document references as type ObjectId in Mongo?

The only requirement for the values of _id is that they be unique as the _id is always indexed automatically by MongoDB and that index is unique.

The purpose of ObjectIDs is to allow the client to generate an ID that is guaranteed to be unique across a broad range of clients that are writing to the same collection. If you have a better unique ID you are encouraged to use it as it saves you an index. You do not need to cast that value into an ObjectID. It can be used in the clear as can other types (e.g. integers Decimals etc.).

Is searching by _id in mongoDB more efficient?

Analyzing your query performance

I advise you to use .explain() provided by mongoDB to analyze your query performance.

Let's say we are trying to execute this query

db.inventory.find( { quantity: { $gte: 100, $lte: 200 } } )

This would be the result of the query execution

{ "_id" : 2, "item" : "f2", "type" : "food", "quantity" : 100 }
{ "_id" : 3, "item" : "p1", "type" : "paper", "quantity" : 200 }
{ "_id" : 4, "item" : "p2", "type" : "paper", "quantity" : 150 }

If we call .execution() this way

db.inventory.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")

It will return the following result:

{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
            "stage" : "COLLSCAN",
            ...
         }
   },
   "executionStats" : {
      "executionSuccess" : true,
      "nReturned" : 3,
      "executionTimeMillis" : 0,
      "totalKeysExamined" : 0,
      "totalDocsExamined" : 10,
      "executionStages" : {
         "stage" : "COLLSCAN",
         ...
      },
      ...
   },
   ...
}

More details about this can be found here

How efficient is search by _id and indexes

To answer your question, using indexes is always more efficient. Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. With _id being the default index provided by MongoDB, that makes it more efficient.

Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.

So, YES, using indexes like _id is better!

You can also create your own indexes by using createIndex()

db.collection.createIndex( <key and index type specification>, <options> )

Optimize your MongoDB query

In case you want to optimize your query, there are multiple ways to do that.

Creating custom indexes to support your queries
Limit the Number of Query Results to Reduce Network Demand

db.posts.find().sort( { timestamp : -1 } ).limit(10)

Use Projections to Return Only Necessary Data

db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 } )

Use $hint to Select a Particular Index

db.users.find().hint( { age: 1 } )

Mongodb: Benefit of Using Objectid VS a String Containing an Id