Fsync, Sync: Does It Really Do What Its Supposed To

fsync, sync: does it really do what its supposed to?

http://linux.die.net/man/8/sync

It does not make sure that files are written to respective storage. It only makes sure that cached/buffered data is flushed to the disk device. It doesn't matter if this is an SD Card or whatever.

fflush, fsync and sync vs memory layers

1. As you correctly concluded from your research fflush synchronizes the user-space buffered data to kernel-level cache (since it's working with FILE objects that reside at user-level and are invisible to kernel), whereas fsync or sync (working directly with file descriptors) synchronize kernel cached data with device. However, the latter comes without a guarantee that the data has been actually written to the storage device -- as these usually come with their own caches as well. I would expect the same holds for msync called with MS_SYNC flag as well.

Relatedly, I find the distinction between synchronized and synchronous operations very useful when talking about the topic. Here's how Robert Love puts it succinctly:

A synchronous write operation does not return until the written data is—at least—stored in the kernel’s buffer cache. [...] A synchronized operation is more restrictive and safer than a merely synchronous operation. A synchronized write operation flushes the data to disk, ensuring that the on-disk data is always synchronized vis-à-vis the corresponding kernel buffers.

With that in mind you can call open with O_SYNC flag (together with some other flag that opens the file with a write permission) to enforce synchronized write operations. Again, as you correctly assumed this will work only with WRITE THROUGH disk caching policy, which effectively amounts to disabling disk caching.

You can read this answer about how to disable disk caching on Linux. Be sure to also check this website which also covers SCSI-based in addition to ATA-based devices (to read about different types of disks see this page on Microsoft SQL Server 2005, last updated: Apr 19, 2018).

Speaking of which, it is very informative to read about how the issue is dealt with on Windows machines:

To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.

Apparently, this is how Microsoft SQL Server 2005 family ensures data integrity:

All versions of SQL Server open the log and data files using the Win32 CreateFile function. The dwFlagsAndAttributes member includes the FILE_FLAG_WRITE_THROUGH option when opened by SQL Server. [...]
This option instructs the system to write through any intermediate cache and go directly to disk. The system can still cache write operations, but cannot lazily flush them.

I'm saying this is informative in particular because of this blog post from 2012 showing that some SATA disks ignore the FILE_FLAG_WRITE_THROUGH! I don't know what the current state of affairs is, but it seems that in order to ensure that writing to a disk is truly synchronized, you need to:

  1. Disable disk caching using your device drivers.
  2. Make sure that the specific device you're using supports write-through/no-caching policy.

However, if you're looking for a guarantee of data integrity you could just buy a disk with its own battery-based power supply that goes beyond capacitors (which is usually only enough for completing the on-going write processes). As put in the conclusion in the blog article mentioned above:

Bottom-line, use Enterprise-Class disks for your data and transaction log files. [...] Actually, the situation is not as dramatic as it seems. Many RAID controllers have battery-backed cache and do not need to honor the write-through requirement.

2. To (partially) answer the second question, this is from the man pages SYNC(2):

According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guarantee data integrity: modern disks have large caches.)

This would imply that fsync and sync work differently, however, note they're both implemented in unistd.h which suggests some consistency between them. However, I would follow Robert Love who does not recommend using sync syscall when writing your own code.

The only real use for sync() is in the implementation of the sync utility. Applications should use fsync() and fdatasync() to commit to disk the data of only the requisite file descriptors. Note that sync() may take several minutes or longer to complete on a busy system.

Does closing a file imply flushing in Perl under Unix/MS-Windows/MacOs?

Perl's close, like C's fclose, will flush the language's IO library's buffers to the OS. The application can safely be terminated at this point.

However, this does not mean the file has been committed to disk. The machine cannot be safely turned off at this point.[1] Like the passage you claim says, fsync will get you closer to that. Remember, you have to not only sync the file, but each directory in its path.

fsync is available as IO::Handle::sync. File handles inherit from IO::Handle, so you can simply use $fh->sync. (This requires use IO::Handle; on older versions of Perl.)


  1. Unplugging the drive would not be safe either, though ejecting the media would be safe. The OS will complete its writes before allowing the ejection to happen.

What is the difference between `O_DIRECT | O_SYNC` + write() and `O_DIRECT` + write() + fsync()

On the one hand there shouldn't be any difference because a similar amount of work has to happen in both cases (write then disk flush assuming no chicanery). On the other hand, the first case has to do twice as many syscalls so (in theory) has more overhead (especially if time it takes to make the syscall is a significant part of total time it takes to do the operation). In all probability it likely depends on the disk/kernel/CPU/size of I/O etc. as to whether there is a difference between the two and which is faster. Maybe in the second case the kernel can send the write down with the FUA bit set which would mean the difference could depend on just what file/device you were opening (because that may control whether such an optimisation can be done)...

Using O_SYNC also makes errors appear on the return of the write() call but as noted in other comments you're not checking the return codes...

Why doesn't MongoDB use fsync()?

The reason is performance. Without having to write to disk on each change, MongoDB can handle updates faster.

MongoDB tells you when updates have been delivered to the server, not when the updates have been written, as you can read in the documentation on Verifying Propagation of Writes with getLastError:

Note: the current implementation returns when the data has been delivered to [the] servers. Future versions will provide more options for delivery vs. say, physical fsync at the server.

This is going against ACID, more specifically against the D, which stands for durability:

Durability [guarantees] that once the user has been notified of a transaction's success the transaction will not be lost, the transaction's data changes will survive system failure, and that all integrity constraints have been satisfied, so the DBMS won't need to reverse the transaction.

ACID properties mostly apply to traditional RDBMS systems. NoSQL systems, which includes MongoDB, give up on one or more of the ACID properties in order to achieve better scalability. In MongoDB's case durability has been sacrificed for better performance when handling large amounts of updates.


MongoDB and ACID

Most ACID properties are guarantees at transaction level. A transaction is usually a group of queries that should be treated as a single unit. MongoDB has no concept of transactions, again for performance reasons. Therefore most ACID properties don't apply to MongoDB.

A — Atomicity states that a transaction should either succeed or fail. It is not allowed to partially succeed; if part of the transaction fails, the entire transaction should be rolled back. MongoDB supports atomic operations on a document level, but not on a 'transaction' level.

C — Consistency partially refers to atomicity, but also includes referential integrity. A relational database is responsible for making sure that all foreign key references are valid. MongoDB has no concept of foreign keys, so this ACID property doesn't apply.

I — Isolation states that two concurrent transactions are not allowed to interfere with each other; if two transactions try to modify the same data, the second transaction has to wait for the first one to complete. To achieve this, the database will lock the data. MongoDB has no concept of locking, so it doesn't support isolation for multiple operations1). Single operations are isolated.

D — Durability is described above. MongoDB doesn't support true durability (yet), in terms of ACID-ic durability.

Now, you may think that MongoDB is useless compared to RDBMS systems because it lacks transactions and most ACID guarantees. However, part of the reason that transactions exist is that relational databases need to treat certain data as a single entity, but this data has been normalized into multiple tables.

MongoDB allows you to store your data as a single entity. This removes the need for foreign keys and referential integrity in most cases. You also don't need multi-query transactions, because you don't need multiple tables to update a single entity. Most of the times you only have to update a single document, and these operations are atomic in MongoDB.

1) According to the first comment on this page, db.eval() provides isolation for multiple operations. However, according to the documentation you usually want to avoid the use of db.eval().

Is it possible to force a sync of a Windows Network Share?

fsync only knows about local file-systems. It can't possibly ensure any connected client can access the file. I suggest you rewrite you application, and instead return the file directly. Thus avoiding the sync altogether, and actually simplifying client & server.

What exactly is file.flush() doing?

There's typically two levels of buffering involved:

  1. Internal buffers
  2. Operating system buffers

The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.

However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.

If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.

So, in order to help with that you have the flush and fsync methods, on their respective objects.

The first, flush, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.

Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been "permanently" stored on disk.

To do that, you need to call the os.fsync method which ensures all operating system buffers are synchronized with the storage devices they're for, in other words, that method will copy data from the operating system buffers to the disk.

Typically you don't need to bother with either method, but if you're in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.


Addendum in 2018.

Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don't really know.



Related Topics



Leave a reply



Submit