Any Detailed and Specific Reasons for Why Mongodb Is Much Faster Than SQL Dbs

Any detailed and specific reasons for Why MongoDB is much faster than SQL DBs?

First, let's compare apples with apples: Reads and writes with MongoDB are like single reads and writes by primary key on a table with no non-clustered indexes in an RDBMS.

So lets benchmark exactly that: http://mysqlha.blogspot.de/2010/09/mysql-versus-mongodb-yet-another-silly.html

And it turns out, the speed difference in a fair comparison of exactly the same primitive operation is not big. In fact, MySQL is slightly faster. I'd say, they are equivalent.

Why? Because actually, both systems are doing similar things in this particular benchmark. Returning a single row, searched by primary key, is actually not that much work. It is a very fast operation. I suspect that cross-process communication overheads are a big part of it.

My guess is, that the more tuned code in MySQL outweighs the slightly less systematic overheads of MongoDB (no logical locks and probably some other small things).

This leads to an interesting conclusion: You can use MySQL like a document database and get excellent performance out of it.


If the interviewer said: "We don't care about documents or styles, we just need a much faster database, do you think we should use MySQL or MongoDB?", what would I answer?

I'd recommend to disregard performance for a moment and look at the relative strength of the two systems. Things like scaling (way up) and replication come to mind for MongoDB. For MySQL, there are a lot more features like rich queries, concurrency models, better tooling and maturity and lots more.

Basically, you can trade features for performance. Are willing to do that? That is a choice that cannot be made generally. If you opt for performance at any cost, consider tuning MySQL first before adding another technology.


Here is what happens when a client retrieves a single row/document by primary key. I'll annotate the differences between both systems:

  1. Client builds a binary command (same)
  2. Client sends it over TCP (same)
  3. Server parses the command (same)
  4. Server accesses query plan from cache (SQL only, not MongoDB, not HandlerSocket)
  5. Server asks B-Tree component to access the row (same)
  6. Server takes a physical readonly-lock on the B-Tree path leading to the row (same)
  7. Server takes a logical lock on the row (SQL only, not MongoDB, not HandlerSocket)
  8. Server serializes the row and sends it over TCP (same)
  9. Client deserializes it (same)

There are only two additional steps for typical SQL-bases RDBMS'es. That's why there isn't really a difference.

Why Is MongoDB So Fast

MongoDB isn't like a traditional relational database. It's noSQL or document based, it provides weak consistency guarantees, and it doesn't have to guarantee consistency like SQL.

MySQL vs MongoDB 1000 reads

MongoDB is not magically faster. If you store the same data, organised in basically the same fashion, and access it exactly the same way, then you really shouldn't expect your results to be wildly different. After all, MySQL and MongoDB are both GPL, so if Mongo had some magically better IO code in it, then the MySQL team could just incorporate it into their codebase.

People are seeing real world MongoDB performance largely because MongoDB allows you to query in a different manner that is more sensible to your workload.

For example, consider a design that persisted a lot of information about a complicated entity in a normalised fashion. This could easily use dozens of tables in MySQL (or any relational db) to store the data in normal form, with many indexes needed to ensure relational integrity between tables.

Now consider the same design with a document store. If all of those related tables are subordinate to the main table (and they often are), then you might be able to model the data such that the entire entity is stored in a single document. In MongoDB you can store this as a single document, in a single collection. This is where MongoDB starts enabling superior performance.

In MongoDB, to retrieve the whole entity, you have to perform:

  • One index lookup on the collection (assuming the entity is fetched by id)
  • Retrieve the contents of one database page (the actual binary json document)

So a b-tree lookup, and a binary page read. Log(n) + 1 IOs. If the indexes can reside entirely in memory, then 1 IO.

In MySQL with 20 tables, you have to perform:

  • One index lookup on the root table (again, assuming the entity is fetched by id)
  • With a clustered index, we can assume that the values for the root row are in the index
  • 20+ range lookups (hopefully on an index) for the entity's pk value
  • These probably aren't clustered indexes, so the same 20+ data lookups once we figure out what the appropriate child rows are.

So the total for mysql, even assuming that all indexes are in memory (which is harder since there are 20 times more of them) is about 20 range lookups.

These range lookups are likely comprised of random IO — different tables will definitely reside in different spots on disk, and it's possible that different rows in the same range in the same table for an entity might not be contiguous (depending on how the entity has been updated, etc).

So for this example, the final tally is about 20 times more IO with MySQL per logical access, compared to MongoDB.

This is how MongoDB can boost performance in some use cases.

Why are key value pair noSQL db's faster than traditional relational DBs

The key advantage of a relational database is the ability to relate and index information. Most 'NoSQL' systems don't provide a relational algebra or a great query language.

What you need to ask yourself is, does switching make sense for my intended use case?

You have kind of missed the point. The point is, you sometimes don't have an index (in the way you do with a general relational DB anyways). Even when you do have an index, the ability to relate it together is difficult and what relational databases excel at. NoSQL solutions have a number of novel structure which make many usecases trivially easy, e.g. Redis is a data-structure oriented DB well-suited to rapidly building anything with queues or its pub-sub architecture. MongoDB is a freeform document database which stores documents as JSON (BSON) and excels at rapid development. BigTable solutions are a little less structured than that, but expand the idea of a row to have families of columns — key value pairs contained in each row arranged efficiently on disk. You can build an inverted index on top of this with a technology like ElasticSearch.

Not everything needs the consistency guarantees or disk layout of a traditional RDBMS. Another major use case of NoSQL is massive scalability, many solutions (e.g. BigTable -- HBase/Cassandra) are designed to shard and scale horizontally easily (not so easy with SQL!). Cassandra in particular is designed for no SPOF. Further, column-oriented datastores are meant to optimize disk speeds via sequential reads (and reduce write-amplification). That being said, unless you really need it, a traditional SQL server is generally good enough.

There's advantages and disadvantages. Personally, I use a mix of both. Use the right tool for the right job, which may end up being PostgreSQL or MySQL more often than not.

You can liken a basic key-value system to making an SQL table with two columns, a unique key and a value. This is quite fast. You have no need to do any relations or correlations or collation of data. Just find the value and return it. This is an oversimplification, NoSQL databases do have a lot of interesting functionality and application beyond simple K,V stores.

I don't know if your scientific data is well suited to most NoSQL implementations, that depends on the data. If you look at HBase or Cassandra, it may well suit a scientist's needs (with proper rowkey design -- timestamp must not be first, check out OpenTSDB). I know of many companies that store sensor readings in Cassandra by using a random-order partitioner and the UUID of the sensor to roll up readings into daily fat rows. Every day new databases are created around specific use cases, so that answer may change. For specific use cases, you can reap huge rewards for using specific datastores at the cost of flexibility and tooling.

SQL versus noSQL (speed)

The definition of noSQL systems is a very broad one -- a database that doesn't use SQL / is not a RDBMS.
Therefore, the answer to your question is, in short: "it depends".

Some noSQL systems are basically just persistent key/value storages (like Project Voldemort). If your queries are of the type "look up the value for a given key", such a system will (or at least should be) faster that an RDBMS, because it only needs to have a much smaller feature set.

Another popular type of noSQL system is the document database (like CouchDB).
These databases have no predefined data structure.
Their speed advantage relies heavily on denormalization and creating a data layout that is tailored to the queries that you will run on it. For example, for a blog, you could save a blog post in a document together with its comments. This reduces the need for joins and lookups, making your queries faster, but it also could reduce your flexibility regarding queries.

When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site?

Relational databases enforces ACID. So, you will have schema based transaction oriented data stores. It's proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.

But, there are limitations on speed and scaling when it comes to massive high availability data stores. For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not performant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That's the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.

NoSQL databases have been around for a long time - just the term is new. Some examples are graph, object, column, XML and document databases.

For your 2nd question: Is it okay to use both on the same site?

Why not? Both serves different purposes right?



Related Topics



Leave a reply



Submit