What Exactly Is File.Flush() Doing

What exactly is file.flush() doing?

There's typically two levels of buffering involved:

  1. Internal buffers
  2. Operating system buffers

The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.

However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.

If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.

So, in order to help with that you have the flush and fsync methods, on their respective objects.

The first, flush, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.

Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been "permanently" stored on disk.

To do that, you need to call the os.fsync method which ensures all operating system buffers are synchronized with the storage devices they're for, in other words, that method will copy data from the operating system buffers to the disk.

Typically you don't need to bother with either method, but if you're in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.


Addendum in 2018.

Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don't really know.

What does it mean to flush file contents in Python?

Python buffers writes to files. That is, file.write returns before the data is actually written to your hard drive. The main motivation of this is that a few large writes are much faster than many tiny writes, so by saving up the output of file.write until a bit has accumulated, Python can maintain good writing speeds.

file.flush forces the data to be written out at that moment. This is hand when you know that it might be a while before you have more data to write out, but you want other processes to be able to view the data you've already written. Imagine a log file that grows slowly. You don't want to have to wait ages before enough entries have built up to cause the data to be written out in one big chunk.

In either case, file.close causes the remaining data to be flushed, so "quux" in your code will be written out as soon as file (which is a really bad name as it shadows the builtin file constructor) falls out of scope of the with context manager and gets closed.

Note: your OS does some buffering of its own, but I believe every OS where Python is implemented will honor file.flush's request to write data out to the drive. Someone please correct me if I'm wrong.

By the way, "no-op" means "no operation", as in it won't actually do anything. For example, StringIO objects manipulate strings in memory, not files on your hard drive. StringIO.flush probably just immediately returns because there's not really anything for it to do.

What is the purpose of flush() in Java streams?

From the docs of the flush method:

Flushes the output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that, if any bytes previously written have been buffered by the implementation of the output stream, such bytes should immediately be written to their intended destination.

The buffering is mainly done to improve the I/O performance. More on this can be read from this article: Tuning Java I/O Performance.

flush() java file handling

The advantage of buffering is efficiency. It is generally faster to write a block of 4096 bytes one time to a file than to write, say, one byte 4096 times.

The disadvantage of buffering is that you miss out on the feedback. Output to a handle can remain in memory until enough bytes are written to make it worthwhile to write to the file handle. One part of your program may write some data to a file, but a different part of the program or a different program can't access that data until the first part of your program copies the data from memory to disk. Depending on how quickly data is being written to that file, this can take an arbitrarily long time.

When you call flush(), you are asking the OS to immediately write out whatever data is in the buffer to the file handle, even if the buffer is not full.

How can I tell file flush actually works before file close?

outfile.flush() will flush the stream buffer to the operating system. If you want to make sure the system buffers are flushed to disk, you must issue an OS specific system call to do that. On POSIX compliant systems, you can call sync() defined in <unistd.h>.

How often does python flush to a file?

For file operations, Python uses the operating system's default buffering unless you configure it do otherwise. You can specify a buffer size, unbuffered, or line buffered.

For example, the open function takes a buffer size argument.

http://docs.python.org/library/functions.html#open

"The optional buffering argument specifies the file’s desired buffer size:"

  • 0 means unbuffered,
  • 1 means line buffered,
  • any other positive value means use a buffer of (approximately) that size.
  • A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files.
  • If omitted, the system default is used.

code:

bufsize = 0
f = open('file.txt', 'w', buffering=bufsize)

What does print()'s `flush` do?

Normally output to a file or the console is buffered, with text output at least until you print a newline. The flush makes sure that any output that is buffered goes to the destination.

I do use it e.g. when I make a user prompt like Do you want to continue (Y/n):, before getting the input.

This can be simulated (on Ubuntu 12.4 using Python 2.7):

from __future__ import print_function

import sys
from time import sleep

fp = sys.stdout
print('Do you want to continue (Y/n): ', end='')
# fp.flush()
sleep(5)

If you run this, you will see that the prompt string does not show up until the sleep ends and the program exits. If you uncomment the line with flush, you will see the prompt and then have to wait 5 seconds for the program to finish

What's the difference between `flush` and `sync_all`?

TL;DR: If you want to "ensure" that the data has been written to the device, then use File::sync_all if you use a File. Note that this isn't necessary though.


The Write::flush implementation for File uses the operating system dependent flush operation, for example std::sys::unix::File::flush, or std::sys::windows::File::flush. Those flush operations do... nothing. Both implementations just return Ok(()).

Why? Because the write() already uses the underlying system call for write() in both cases; the handle-based write on Windows, and the file descriptor-based write on Unix-like systems. At that point, it's out of reach of the Rust environment, save for a system call that's specific to files.

So what is Write::flush useful for? It's useful if you have any kind of buffer before the actual file, for example a BufWriter. If you have a File wrapped by a BufWriter, then you need to use flush to ensure that the bytes get written to the file. While it's useful to keep in mind that BufWriter's Drop implementation also tries(!) to write those bytes, it may or may not work, so you're supposed to call Write::flush there (see BufWriter's documentation).

That being said, sync_all isn't necessary and instead will block your program. The operating system will handle the file system synchronisation. While you can certainly wait for that synchronisation to happen via sync_data or sync_all, you're usually better of with not doing either.



Related Topics



Leave a reply



Submit