Storing Files in Database VS File System

Database vs File system storage

A database is generally used for storing related, structured data, with well defined data formats, in an efficient manner for insert, update and/or retrieval (depending on application).

On the other hand, a file system is a more unstructured data store for storing arbitrary, probably unrelated data. The file system is more general, and databases are built on top of the general data storage services provided by file systems. [Quora]

The file system is useful if you are looking for a particular file, as operating systems maintain a sort of index. However, the contents of a txt file won't be indexed, which is one of the main advantages of a database.

For very complex operations, the filesystem is likely to be very slow.

Main RDBMS advantages:

  • Tables are related to each other

  • SQL query/data processing language

  • Transaction processing addition to SQL (Transact-SQL)

  • Server-client implementation with server-side objects like stored procedures, functions, triggers, views, etc.

Advantage of the File System over Data base Management System is:

When handling small data sets with arbitrary, probably unrelated data, file is more efficient than database.
For simple operations, read, write, file operations are faster and simple.

You can find n number of difference over internet.

Storing a file in a database as opposed to the file system?

Have a look at this answer:

Storing Images in DB - Yea or Nay?

Essentially, the space and performance hit can be quite big, depending on the number of users. Also, keep in mind that Web servers are cheap and you can easily add more to balance the load, whereas the database is the most expensive and hardest to scale part of a web architecture usually.

There are some opposite examples (e.g., Microsoft Sharepoint), but usually, storing files in the database is not a good idea.

Unless possibly you write desktop apps and/or know roughly how many users you will ever have, but on something as random and unexpectable like a public web site, you may pay a high price for storing files in the database.

Storing files in database Vs file system

You might want to store it directly into filesystem.

When using filesystem careful with :

  • Confidentiality : Put documents outside of your Apache Document Root. Then a PHP Controller of yours will output documents.
  • Sharded path : do not store thousands of documents in the same directory, make differents directories. You can shard with a Hash on the Filename for example. Such as /documents/A/F/B/AFB43677267ABCEF5786692/myfile.pdf.
  • Inode number : You can run out of inodes if you store a lot of small files (might not be your case if storing mostly PDF and office documents).

If you need to search for these documents (date/title/etc...) you may want to store metadata into a database for better performances.

FYI, in this question MS SQL Server has FILESYSTEM column type (like an hybrid), but at the moment MySQL doesn't have an alternative.

Best way to store files

Microsoft did a research in this topic: https://www.microsoft.com/en-us/research/publication/to-blob-or-not-to-blob-large-object-storage-in-a-database-or-a-filesystem/

Storing very small files will get you the best performance in the database. Storing larger files give you the best performance on your hard drive. I researched this for a company where I work for. The file system performance will be better than the database when the file size is 512 kB or larger. The performance of the database will drop rapidly after this point.

Storing files in the database will give you the advantage that you can keep everything in sync. You can configure that a file BLOB will be removed when the file record is removed. However, storing large files will give you very bad performance and creating backups could take very long.

File save on File System VS In Database

I never used a BLOB. E.g. I just store user uploaded photos normally in directories. I don't see much reason for using a BLOB for storing files. You say it could be easier to backup - on the contrary, that could become very problematic, at least in our case as we have many GB of photos, but the database must be kept rather small in order to be able to backup it often and with PHPMyAdmin.

User uploaded files in PHP: Storing in database VS storing in file system

It's incorrect to think a database is intrinsically more secure than files on disk. A database, after all, is files on disk. It's also typically a lot easier to bust into your MySQL server than it is to access the machine via shell, MySQL uses passwords and the shell, if properly configured, uses only SSH keys.

The other concern is that as you load more and more binary data into your database it becomes considerably more expensive to back-up properly. MySQL doesn't do differential backups very well, while files on disk are trivial to quickly and efficiently replicate with a tool like rsync.

File-systems, not surprisingly, are very good at storing large amounts of arbitrary binary data. Relational databases are not. Additionally a lot of work has been done at the operating system level to make serving files off of disk as efficient as possible.

Here's what the computer has to do to fetch a file from disk and send it to the network:

  1. Open the file.
  2. Make a system call like sendfile.
  3. The kernel handles reading from disk, writing to the network device.

Here's what you have to do to send it from a database like MySQL:

  1. Open a MySQL connection and authenticate.
  2. Compose a command like SELECT file FROM tablename WHERE id=?
  3. Encode that with the MySQL binary protocol and send it over the network connection to the MySQL server. This could be local or remote, and in the remote case even more overhead is involved.
  4. The server receives the command and decodes it, first unpacking the command.
  5. The server has to parse the command and interpret it.
  6. The server has to open the table in question as well as the index file, looking for the location of the data there.
  7. Once found, the data has to be decoded from the MySQL row format, then re-encoded for the MySQL result format.
  8. That data is transmitted back over the wire to the client.
  9. The client must receive and decode the result set.
  10. The client must extract the relevant binary information.
  11. The client must copy that data to another buffer to send it back out the network connection.
  12. The kernel needs to transfer that data from user-space to kernel space and feed it to the network driver.

That's considerably more work and involves a multitude of mandatory copies due to crossing the user-space/kernel-space boundary many times.

If you want a document store, look at something like Riak instead of an RDBMS like MySQL.

Database vs. File System Storage with Somewhat Big Data

The short answer is "maybe".

The longer answer is that it will depend on a few factors:

1. Properly structuring your data. This means splitting unrelated data into separate documents, properly creating associations between related data, etc.

2. Proper indexing of your data. For example, if you have documents representing individual "chunks" of a stream, with a "stream ID" to identify which stream the chunks belong to, then having an index for the "stream ID" field will ensure that you can efficiently grab all chunks for that stream.

3. The resources you have available to you. You may need to look into horizontal scaling of a database, i.e. sharding, which will require you to really know what you're doing. You will likely want a dedicated DBA just to handle the setup and maintenance of the data, especially in getting replication in place to avoid the loss of one node completely killing your data set. This is going to cost money.

4. Your ability to correctly and accurately migrate all of that data into the database. One little slip-up could mean that you're missing an important chunk, or data that should be associated isn't, or data is entered incorrectly or as the wrong type, or any number of problems.

It's definitely recommended that you use a database. The indexing and data separation alone will have a tremendous impact on the efficiency of data retrieval, even with such a large amount of data. If nothing else, the reduced file I/O and getting rid of direct parsing of file contents should make things much faster. But if you're going to use a database, you need to be incredibly careful. There is a ton of work involved that you shouldn't be taking on if you have terabytes of existing data that you need to preserve. You're going to want someone experienced to handle the migration, setup, and long-term maintenance. This is no light undertaking.

RDBMS vs file system for file storage

Given that the files are not expected to change, there is limited value in keeping the files in the DBMS. The primary advantage of keeping files in the DBMS is that the DBMS knows how to manage transactions, but if the files won't change, then that advantage becomes minuscule.

Another advantage of storing files in the DBMS is that the database backup will contain the files; with the files stored separately, you have to backup the separate stash of files as well as the DBMS itself to keep all the data secure.

Another advantage of storing files in the DBMS is that the database can enforce more subtle controls on access to the files.

The primary advantage of storing the files in the file system is that it is easy (easier) to see what you've got.

A secondary advantage is that you can back up or manipulate the files outside the DBMS - though that is also a disadvantage from some points of view.

If the files are stored in blobs in the DBMS, then the normal SQL client software can retrieve the contents over a normal SQL connection. If the SQL client software is not on the same machine as the DBMS and the files, then you have to worry about how clients do get hold of the file data.

Another advantage of separating the files from the DBMS is that the files could be stored off the DBMS machine. On the other hand, that then complicates getting the files loaded 'into the DBMS'.


On the whole, given the issues outlined above, there seem to be some advantages with going with the 'files in DBMS' approach. On the other hand, many people do go with 'files in file system' approach, and they survive. It may be that their SQL clients are on the same machine as the DBMS, so the file transfer issues are not insurmountable, but that's the bit that has me most worried.

File Storage for Web Applications: Filesystem vs DB vs NoSQL engines

Not a direct answer but some pointers to very interesting and somehow similar questions (yeah, they are about blobs and images but this is IMO comparable).

What are the downsides of storing files as BLOBs in MySQL?

  • Storing Images in DB - Yea or Nay?
  • Images in database vs file system
  • https://stackoverflow.com/search?q=images+database+filesystem

Do the same problems exist with NoSQL systems like Cassandra?

  • NoSQL for filesystem storage organization and replication?
  • Storing images in NoSQL stores

PS: I don't want to be the killjoy but I don't think that any NoSQL solution is going to solve your problem (NoSQL is just irrelevant for most businesses).

Storing images in a database versus a filesystem

If the images are user data, rather than part of your application's code or theme, then storing the images in the database is a good idea, because…

  • Backups are easier to manage if all you have to back up is the database. On the other hand, if you store some application data in the database and some in the filesystem, then you'll have to coordinate the backup schedules of your database and your filesystem to ensure that the two are consistent.

    If you have a database administrator at your disposal, then great! Your backups should already be taken care of. If not, then database backups may be slightly tricky to set up, but once you do have a backup system, it can be better than filesystem backups. For example, many database systems have support for streaming replication.

  • If your application is load-balanced and served by a pool of multiple webservers, then you'll either have to replicate the data to all of the machines, or share them among your servers using a network filesystem.

Of course, having the images on a filesystem also has its advantages, namely in performance and simplicity, since most webservers are built to serve static files. A hybrid approach could give you the best of both worlds:

  • The images stored in the database would be the authoritative data.
  • Your application can have a feature to extract them as files in their local filesystem as a kind of cache. That cache can be rebuilt at any time, since it is not authoritative.
  • The webserver can then serve the files directly from the filesystem.


Related Topics



Leave a reply



Submit