Good Reasons Not to Use a Relational Database

Good reasons NOT to use a relational database?

Plain text files in a filesystem

  • Very simple to create and edit
  • Easy for users to manipulate with simple tools (i.e. text editors, grep etc)
  • Efficient storage of binary documents

XML or JSON files on disk

  • As above, but with a bit more ability to validate the structure.

Spreadsheet / CSV file

  • Very easy model for business users to understand

Subversion (or similar disk based version control system)

  • Very good support for versioning of data

Berkeley DB (Basically, a disk based hashtable)

  • Very simple conceptually (just un-typed key/value)
  • Quite fast
  • No administration overhead
  • Supports transactions I believe

Amazon's Simple DB

  • Much like Berkeley DB I believe, but hosted

Google's App Engine Datastore

  • Hosted and highly scalable
  • Per document key-value storage (i.e. flexible data model)

CouchDB

  • Document focus
  • Simple storage of semi-structured / document based data

Native language collections (stored in memory or serialised on disk)

  • Very tight language integration

Custom (hand-written) storage engine

  • Potentially very high performance in required uses cases

I can't claim to know anything much about them, but you might also like to look into object database systems.

When shouldn't you use a relational database?

In my experience, you shouldn't use a relational database when any one of these criteria are true:

  • your data is structured as a hierarchy or a graph (network) of arbitrary depth,
  • the typical access pattern emphasizes reading over writing, or
  • there’s no requirement for ad-hoc queries.

Deep hierarchies and graphs do not translate well to relational tables. Even with the assistance of proprietary extensions like Oracle's CONNECT BY, chasing down trees is a mighty pain using SQL.

Relational databases add a lot of overhead for simple read access. Transactional and referential integrity are powerful, but overkill for some applications. So for read-mostly applications, a file metaphor is good enough.

Finally, you simply don’t need a relational database with its full-blown query language if there are no unexpected queries anticipated. If there are no suits asking questions like "how many 5%-discounted blue widgets did we sell in on the east coast grouped by salesperson?", and there never will be, then you, sir, can live free of DB.

Are there good reasons not to use an ORM?

There's been an explosion of growth with ORMs in recent years and your more experienced coworkers may still be thinking in the "every database call should be through a stored procedure" mentality.

Why would an ORM make things harder to debug? You'll get the same result whether it comes from a stored proc or from the ORM.

I guess the only real detriment that I can think of with an ORM is that the security model is a little less flexible.

EDIT: I just re-read your question and it looks they are copy and pasting the queries into inline sql. This makes the security model the same as an ORM, so there would be absolutely no advantage over this approach over an ORM. If they are using unparametrized queries then it would actually be a security risk.

When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site?

Relational databases enforces ACID. So, you will have schema based transaction oriented data stores. It's proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.

But, there are limitations on speed and scaling when it comes to massive high availability data stores. For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not performant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That's the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.

NoSQL databases have been around for a long time - just the term is new. Some examples are graph, object, column, XML and document databases.

For your 2nd question: Is it okay to use both on the same site?

Why not? Both serves different purposes right?

When not to use MySQL or other relational DBs?

1. When to avoid MySQL in particular?

When concurrent database sessions are both modifying and querying the database.

MySQL is fine for read-only or read-mostly scenarios (it is no accident that MySQL is frequently used for Web), but more advanced multi-version concurrency control capabilities of Oracle, MS SQL Server, PostgreSQL or even Firebird/Interbase can often handle read-write workloads not just with better performance but with better correctness as well (i.e. they are better at avoiding various concurrency artifacts that may endanger data consistency).

Even traditional "locking" databases such as DB2 or Sybase are likely to handle read-write workloads better than MySQL.

2. When to not use relational databases in general?

In short: when your data is not relational (i.e. it does not fit well in the paradigm of entities, attributes and relationships).

That being said, many modern DBMSes have capabilities outside traditional relational model, such as ability to "understand" hierarchical structure of XML. So even unstructured data that would not normally be stored in the relational DB (or at best would be stored in a BLOB) is no longer necessarily off-limits.

Would relational databases scale as well (or better) than their NoSQL counterparts if we drop the relationships?

I don't find that inter-table relationships are the main limiter for scalability. I use queries with joins regularly and get good scalability if indexes are defined well.

The greater limiter for scalability is the cost of synchronous I/O. The requirements of consistency and durability -- that the DBMS actually and reliably saves data when it tells you it saved data -- is expensive.

Several NoSQL products that are currently in vogue achieve great performance by weakening their consistency and durability guarantees in their default configuration. There are many reports of CouchDB or MongoDB losing data.

There are ways you can configure those NoSQL products to be more strict about durability, but then you sacrifice their impressive performance numbers.

Likewise, you can make an SQL database achieve high performance like the NoSQL products, by disabling the default features that ensure data safety. See RunningWithScissorsDB.

PS: If you think document-oriented databases are "cutting edge", I invite you to read about MUMPS. Everything old is new again. :-)

Are relational databases a poor fit for Node.js?

No, there isn't a technical reason. It's mostly just opinion and using NoSQL with Node.js is currently a popular choice.

Granted, Node's ecosystem is largely community-driven. Everything beyond Node's core API requires community involvement. And, certainly, people will be more likely to support what aligns with their personal preferences.

But, many still use and support relational databases with Node.js. Some notable projects include:

  • mysql
  • pg
  • sequelize

Why are relational databases having scalability issues?

Relational databases provide solid, mature services according to the ACID properties. We get transaction-handling, efficient logging to enable recovery etc. These are core services of the relational databases, and the ones that they are good at. They are hard to customize, and might be considered as a bottleneck, especially if you don't need them in a given application (eg. serving website content with low importance; in this case for example, the widely used MySQL does not provide transaction handling with the default storage engine, and therefore does not satisfy ACID). Lots of "big data" problems don't require these strict constrains, for example web analytics, web search or processing moving object trajectories, as they already include uncertainty by nature.

When reaching the limits of a given computer (memory, CPU, disk: the data is too big, or data processing is too complex and costly), distributing the service is a good idea. Lots of relational and NoSQL databases offer distributed storage. In this case however, ACID turns out to be difficult to satisfy: the CAP theorem states somewhat similar, that availability, consistency and partition tolerance can not be achieved at the same time. If we give up ACID (satisfying BASE for example), scalability might be increased.
See this post eg. for categorization of storage methods according to CAP.

An other bottleneck might be the flexible and clever typed relational model itself with SQL operations: in lots of cases a simpler model with simpler operations would be sufficient and more efficient (like untyped key-value stores). The common row-wise physical storage model might also be limiting: for example it isn't optimal for data compression.

There are however fast and scalable ACID compliant relational databases, including new ones like VoltDB, as the technology of relational databases is mature, well-researched and widespread. We just have to select an appropriate solution for the given problem.



Related Topics



Leave a reply



Submit