Single or Multiple Databases

Multiple Databases Vs Single Database with logically partitioned data

You'll wish you had used separate databases:

  • If you ever want to grant permissions to the databases themselves to clients or superusers.
  • If you ever want to restore just one client's database without affecting the data of the others.
  • If there are regulatory concerns governing your data and data breaches, and you belatedly discover that these regulations can only be met by having separate databases. (Update: a little over 4 years after the writing of this answer, GDPR went into effect)
  • If you ever want to easily move your customer data to multiple database servers or otherwise scale out, or move larger/more important customers to different hardware. In a different part of the world.
  • If you ever want to easily archive and decommission old customer data.
  • If your customers care about their data being siloed, and they find out that you did otherwise.
  • If your data is subpoenaed and it's hard to extract just one customer's data, or the subpoena is overly broad and you have to produce the entire database instead of just the data for the one client.
  • When you forget to maintain vigilance and just one query slips through that didn't include AND CustomerID = @CustomerID. Hint: use a scripted permissions tool, or schemas, or wrap all tables with views that include WHERE CustomerID = SomeUserReturningFunction(), or some combination of these.
  • When you get permissions wrong at the application level and customer data is exposed to the wrong customer.
  • When you want to have different levels of backup and recovery protection for different clients.
  • Once you realize that building an infrastructure to create, provision, configure, deploy, and otherwise spin up/down new databases is worth the investment because it forces you to get good at it.
  • When you didn't allow for the possibility of some class of people needing access to multiple customers' data, and you need a layer of abstraction on top of Customer because WHERE CustomerID = @CustomerID won't cut it now.
  • When hackers target your sites or systems, and you made it easy for them to get all the data of all your customers in one fell swoop after getting admin credentials in just one database.
  • When your database backup takes 5 hours to run and then fails.
  • When you have to get the Enterprise edition of your DBMS so you can make compressed backups so that copying the backup file over the network takes less than 5 hours more.
  • When you have to restore the entire database every day to a test server which takes 5 hours, and run validation scripts that take 2 hours to complete.
  • When only a few of your customers need replication and you have to apply it to all of your customers instead of just those few.
  • When you want to take on a government customer and find out that they require you to use a separate server and database, but your ecosystem was built around a single server and database and it's just too hard or will take too long to change.

You'll be glad you used separate databases:

  • When a pilot rollout to one customer completely explodes and the other 999 customers are completely unaffected. And you can restore from backup to fix the problem.
  • When one of your database backups fails and you can fix just that one in 25 minutes instead of starting the entire 10-hour process over again.

You'll wish you had used a single database:

  • When you discover a bug that affects all 1000 clients and deploying the fix to 1000 databases is hard.
  • When you get permissions wrong at the database level and customer data is exposed to the wrong customer.
  • When you didn't allow for the possibility of some class of people needing access to a subset of all the databases (perhaps two customers merge).
  • When you didn't think how hard it would be to merge two different databases of data.
  • When you've merged two different databases of data and realize one was the wrong one, and you didn't plan for recovering from this scenario.
  • When you try to grow past 32,767 customers/databases on a single server and find out that this is the maximum in SQL Server 2012.
  • When you realize that managing 1,000+ databases is a bigger nightmare than you ever imagined.
  • When you realize that you can't onboard a new customer just by adding some data in a table, and you have to run a bunch of scary and complicated scripts to create, populate, and set permissions on a new database.
  • When you have to run 1000 database backups every day, make sure they all succeed, copy them over the network, restore them all to a test database, and run validation scripts on each single one, reporting any failures in a way that will guaranteed to be seen, and which are easily and quickly actionable. And then 150 of these fail in various places and have to be fixed one at a time.
  • When you find out you have to set up replication for 1000 databases.

Just because I listed more reasons for one doesn't mean it is better.

Some readers may get value from MSDN: Multi-Tenant Data Architecture. Or perhaps SaaS Tenancy App Design Patterns. Or even Developing Multi-tenant Applications for the Cloud, 3rd Edition

One Database or multiple Databases

Rather than providing you generic pros/cons (because everything depends on the use case...), I would say that we tend to prematurely optimize systems while optimization shouldn't be a problem in the future, if the whole system is architected with refactoring in mind.

IMHO, I had the same discussion some time ago and my conclusion is starting with a single database. It simplifies a lot of details:

  • Single database to backup, less maintainance.
  • You don't need to manage multiple connections.
  • Multiple databases can break the chance to perform atomic transactions, a feature I would never throw away.
  • You avoid synchronizing two or more databases to avoid integrity problems.

Also, since we're in the cloud computing era, infrastructures should scale horizontally. That is, if you need more power, add a replication node and distribute your load across multiple servers instead of scaling in the application level. This ensures your software will be still easy to maintain and develop, and good solutions should scale out easily if your code has quality and, of course, you've budget to support an increased load!

Pros/Cons Using multiple databases vs using single database

Do not store data from multiple customers in the same database -- I have known companies that had to spend a lot of time/effort/money fixing this mistake. I have even known clients to balk at sharing a database computer even though the databases are separate - on the plus side, these clients are generally willing to pay for the extra hardware.

  1. The problems with security alone should prevent you from ever doing this. You will lose large customers because of this.

  2. If you have some customers that are unwilling to upgrade their software, it can be very difficult if you share a single database. Separate databases allow customers to continue using the old database structure until they are ready to upgrade.

  3. You are artificially limiting a natural data partition that could provide significant scalability to your solution. Multiple small customers can still share a database server, they just see their own databases/catalogs, or they can run on separate database servers / instances.

  4. You are complicating your database design because you will have to distinguish customer data that would otherwise be naturally separated, i.e., having to supply CustomerID on each where clause.

  5. You are making your database slower by having more rows in all tables. You will use up database memory more rapidly because CustomerID is now part of every index, and fewer records can be stored in each index node. Your database is also slower due to the loss of the inherent advantage of locality of reference.

  6. Data rollback for 1 customer can be very difficult, maybe even essentially impossible as the database grows - you will need custom procedures to do this that are much slower and resource intensive than a simple and standard restore from backup.

  7. Large databases can be very difficult to backup / restore in a timely manner, possibly requiring additional spending on hardware to make it fast enough.

  8. Your application(s) that use the database will be harder to maintain and test.

  9. Any mistakes can be much more destructive as you can mess up all of your clients by a single mistake.

  10. You prevent the possible performance enhancement of low-latency by forcing your database to a single location. E.g., overseas customer will be using slow, high-latency networks all the time.

  11. You will be known as the stupid DBA, or the unemployed DBA, or maybe both.

There are some advantages to a shared database design though.

  1. Common table schemas, code tables, stored procs, etc. need only be maintained and stored in 1 location.

  2. Licensing costs may be reduced in some cases.

  3. Some maintenance is easier, although almost certainly worse overall using a combined approach.

  4. If all/most of your clients are very small, you can have a low resource utilization by not combining servers (i.e., a relatively high cost). You can mitigate the high cost by combining clients with their permission and explicit understanding, but still use separate databases for larger clients. You definitely need to be explicit and up-front with your clients in this situation.

Except for the server cost sharing, this is a very bad idea still - but cost can be a very important aspect too. This is really the only justification for this approach - avoid this if at all reasonable though. Maybe you would be better off to change a little more for you product, or just not be able to support tiny customers for a cheap price.


Reading an analysis of the recent Atlassian outage reveals that this mistake is precisely why they are having such trouble recovering.

There is a problem, though:

Atlassian can, indeed, restore all data to a checkpoint in a matter of
hours.

However, if they did this, while the impacted ~400 companies would get
back all their data, everyone else would lose all data committed since
that point

So now each customer’s data needs to be selectively restored.
Atlassian has no tools to do this in bulk.

The article also makes it clear that some customers are already migrating away from Atlassian for their OpsGenie product, and will certainly lose future business too. At a minimum, this will be a large problem for their business.

They also messed up big-time by ignoring the customer during this outage.

Is it better to use multiple databases with one schema each, or one database with multiple schemas?

A PostgreSQL "schema" is roughly the same as a MySQL "database". Having many databases on a PostgreSQL installation can get problematic; having many schemas will work with no trouble. So you definitely want to go with one database and multiple schemas within that database.

Multiple databases vs a single database

Purely speculation on what was in the minds of the creators:

Maybe a difference in the volitilaty of the data, so that there could be a different backup/replication strategy for the different physical dbs?

Maybe an idea that say, "general types" could be shared across multiple applications but the "site configuration", for example, would be specific to just one application?

Maybe an idea that the different databases could be put on different pieces of hardware that have different performance characteristics, like their RAID configuration. Data that is read a lot but not updated versus data that is updated a lot.

Again, pure speculation...

@Darryl - my answer is more archeology than technology. I'm not saying I buy any of it. I'm just trying to step into the mindset of the ancestors...

Should I use one or Multiple Databases across the same microservice?

Are these 2 instances of Posts sharing the same DB

Yes they are sharing the same Database.

Regardless of the fact that you micro-service is deployed to multiple instances or one instance the deployment instances of your micro-service are accessing and using your one database which belongs to that micro-service.
All your instances of your micro-service should access the same database. This is the standard and most common use case. There are exceptions to this.
Here is an similar question where I explained it in detailed way.

Single or multiple databases

This question and thus my answer may be close to the gray line of subjective, but at the least I think it would be common practice to separate out the 'admin' tables into their own db for what it sounds like you're doing. If you can tie a client to a specific server and db instance then by having separate db instances, it opens up some easy paths for adding servers to add clients. A single db would require you to monkey with various clustering approaches if you got too big.

[edit]Building in the idea early that each client gets it's own DB also just sets the tone for how you develop when it is easy to make structural and organizational changes. Discovering 2 yrs from now you need to do it will become a lot more painful. I've worked with split dbs plenty of times in the past and it really isn't hard to deal with as long as you can establish some idea of what the context is. Here it sounds like you already have the idea that the client is the context.

Just my two cents, like I said, you could be close to subjective on this one.

Should I use a single or multiple database setup for a multi-client application?

I usually add ClientID to all tables and go with one database.
But since the database is usually hard to scale I will also make it possible to run on different database instances for some or all clients.

That way you can have a bunch of small clients in one database and the big ones on separate servers.

A key factor for maintainability though, is that you keep the schema identical in all databases. There will be headache enough to manage the versioning without introducing client specific schemas.



Related Topics



Leave a reply



Submit