MySQL Binary Storage Using Blob VS Os File System: Large Files, Large Quantities, Large Problems

What is difference between storing data in a blob, vs. storing a pointer to a file?

I read that the data type can be used to store files.

According to MySQL manual page on Blob, A BLOB is a binary large object that can hold a variable amount of data.

Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.

For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.

Besides that, storing large amount of data in your db MAY slow it down. Specially in systems that are not dedicated only to host a database.

I also read that an alternative is to store the file on disk and include a pointer to its location in the database

Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAY speed up your database when dealing with large amount of data.

But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.

In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blog on how the blob is stored in MySQL.

But in abstract, on MySQL 5 and forward the blob is stored as following:

Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.

So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump would be everything that I needed to backup my whole database.

The advantage of storing blobs for backups are better discussed on this post where the person who answered had a similar problem than mine.

Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.

In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.

So I'd say that

If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.

If you will not manage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar columns pointing to your files.

I hope it helped. Cheers

MySQL Blob vs. Disk for video frames

At a certain point, querying for many blobs becomes unbearably slow. I suspect that even if your identically dimensioned binary files this will be the case. Moreover you will still need some code to access and process the blobs. And this doesn't take advantage of file caching that might speed up image queries straight from the file system.

But! The link you provided did not mention object based databases, which can store the data you described in a way that you can access it extremely quickly, and possibly return it in native format. For a discussion see the link or just search google, there are many discussions:

Storing images in NoSQL stores

I would also look into HBase.

I figured since you were not sure about what to use in the first place(and there were no answers), an alternative solution might be appropriate.

When is using MySQL BLOB recommended?

Read:

  • MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems
  • To Do or Not to Do: Store Images in a Database

which concludes

If you on occasion need to retrieve an
image and it has to be available on
several different web servers. But I
think that's pretty much it.

  • If it doesn't have to be available on
    several servers, it's always better to
    put them in the file system.
  • If it has
    to be available on several servers and
    there's actually some kind of load in
    the system, you'll need some kind of
    distributed storage.

Upload large files to BLOB

In MySQL, to be able to save or read BLOB fields with size more than 1MB, you have to increase server side parameter max_allowed_packet to be larger than default. In practice, you can't go much farther than 16-32MB for this parameter. Price for this increase is that every new db client will consume at least as much memory, and in general, server performance will greatly suffer.

In other words, MySQL does not really support handling BLOB fields larger than 1MB (if you can't or don't want to fiddle with server configuration) to around 16MB (even if you do want to do that).

This can be philosophical question - is it good idea or not to keep big blobs in database? I think for many tasks (but not for all) is it great idea, and because MySQL is so bad it this (and for host of other reasons), I simply avoid using it as my SQL server solution.

Instead, I use PostgreSQL, which perfectly supports BLOBs (actually, BYTEA) to advertized limit of 4GB without any tweaks on client or server. In addition to that, it will actually transparently compress them with LZ algorithm - slightly worse than gzip, but still much better than no compression at all.

PHP storage of many large files

Database is meant for data not files! I have come across many situation where many prefer to store it to database, be it mangodb or others. Logically that's not worth.

Your question "is it worth of programming effort?"; seriously no if you are doing once in a while. It takes a lot of effort to put things in database. However if you are a developer working on it frequently once you are get used to you, you will do it even if its not worth to do so :)

I vote for you to go for file system storage for files and not database. And for difficulties of no. of files, you will find a way to resolve it for sure.

How do you manage a large video database?

I'm not sure this is what you are looking for. But...

basically there is two way to deal with "binary large objects".

  • The first one is to store them is a so called BLOB column in your DB.
  • The second one, is, as you do it now, to store in the DB only "pointers" to the actual data on an external storage solution. Usually as "path" referencing files on your file-system..

Both solutions have pro and cons. Regarding performances, data migration, load balancing, and so on. This has already been discussed elsewhere on SO:

  • MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems
  • When is using MySQL BLOB recommended?

CMS vs Filesystem storage id scalability

Your question is very similar to this one. Is your load primarily reading your images or writing? If it's read scalability you need, the post describes memcached, which is probably all you need. jackrabbit has loads more features, but is more for hierarchical text storage. Not sure it will do any better performance wise on your images. Also, if you do choose jackrabbit, make sure your content hierarchy is deep enough for jackrabbit to stay efficient. Any parent with 10,000 or more children is going to have sub-par performance.

BLOB Storage - 100+ GB, MySQL, SQLite, or PostgreSQL + Python

I'm still researching this option for one of my own projects, but CouchDB may be worth a look.



Related Topics



Leave a reply



Submit