Read and Write File Atomically

Read and write file atomically

You want to use File#flock in exclusive mode. Here's a little demo. Run this in two different terminal windows.

filename = 'test.txt'

File.open(filename, File::RDWR) do |file|
file.flock(File::LOCK_EX)

puts "content: #{file.read}"
puts 'doing some heavy-lifting now'
sleep(10)
end

Difference between writing to file atomically and not

Atomic in general means the operation cannot be interrupted will complete or have no effect. When writing files, that is accomplished by writing to a temporary file then replacing the original with the temporary when the write completes.

A crash while writing an atomic file means the original is not modified and there is a garbage file that can be deleted. A crash while writing normally would mean an expected good file is corrupt.

Performance wise the cost is minimal. During the write you will have two copies of a file. The file replace is a very simple operation at the file system level.

Edit: thanks zneak

Atomic writing to file on linux

I recommend writing to a temporary file and then doing a rename(2) on it.

ofstream o("file.tmp"); //Write to a temporary file
o << "my data";
o.close();

//Perform an atomic move operation... needed so readers can't open a partially written file
rename("file.tmp", "file.real");

Atomic file write operations (cross platform)

AFAIK no.

And the reason is that for such an atomic operation to be possible, there has to be OS support in the form of a transactional file system. And none of the mainstream operating system offer a transactional file system.

EDIT - I'm wrong for POSIX-compliant systems at least. The POSIX rename syscall performs an atomic replace if a file with the target name already exists ... as pointed out by @janneb. That should be sufficient to do the OP's operation atomically.

However, the fact remains that the Java File.renameTo() method is explicitly not guaranteed to be atomic, so it does not provide a cross-platform solution to the OP's problem.

EDIT 2 - With Java 7 you can use java.nio.file.Files.move(Path source, Path target, CopyOption... options) with copyOptions and ATOMIC_MOVE. If this is not supported (by the OS / file system) you should get an exception.

Atomically write byte[] to file

Atomic writes to files are not possible. Operating systems don't support it, and since they don't, programming language libraries can't do it either.

The best you are going to get with a files in a conventional file system is atomic file renaming; i.e.

  1. write new file into same file system as the old one

  2. use FileDescriptor.sync() to ensure that new file is written

  3. rename the new file over the old one; e.g. using

    java.nio.file.Files.move(Path source, Path target, 
    CopyOption... options)

    with CopyOptions ATOMIC_MOVE. According to the javadocs, this may not be supported, but if it isn't supported you should get an exception.

But note that the atomicity is implemented in the OS, and if the OS cannot give strong enough guarantees, you are out of luck.

(One issue is what might happen in the event of a hard disk error. If the disk dies completely, then atomicity is moot. But if the OS is still able to read data from the disk after the failure, then the outcome may depend on the OS'es ability to repair a possibly inconsistent file system.)

How to make file creation an atomic operation?

Write data to a temporary file and when data has been successfully written, rename the file to the correct destination file e.g

with open(tmpFile, 'w') as f:
f.write(text)
# make sure that all data is on disk
# see http://stackoverflow.com/questions/7433057/is-rename-without-fsync-safe
f.flush()
os.fsync(f.fileno())
os.replace(tmpFile, myFile) # os.rename pre-3.3, but os.rename won't work on Windows

According to doc http://docs.python.org/library/os.html#os.replace

Rename the file or directory src to dst. If dst is a non-empty directory, OSError will be raised. If dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement).

Note:

  • It may not be atomic operation if src and dest locations are not on same filesystem

  • os.fsync step may be skipped if performance/responsiveness is more important than the data integrity in cases like power failure, system crash etc

atomic write/read of a file in nodejs

There is to my knowledge nothing built in. There are modules such as redis-lock though, that implement a lock mechanism.
If you run on a single non-clustered server you could probably cope with implementing a simple local lock though.

Is Python's file.write atomic?

It looks like the underlying OS write() call might not even be atomic:

Atomicity of `write(2)` to a local filesystem

Is there a way to write file atomically in Java on Linux?

I don't believe Java has an API for this, and it seems to depend on both the OS and filesystem having support, so JNI might be the only way, and even then only on Linux.

I did a quick search for what Cygwin does, seems to be a bit of a hack just to make software work, creating a file with a random name then excluding it only from their own directory listing.

I believe the closest you can get in plain Java is to create a file in some other location (kinda like a /proc/self/fd/... equivalent), and then when you are done writing it, either move it or symbolic link it from the final location. To move the file, you want it on the same filesystem partition so the file contents don't actually need to be copied. Programs watching for the file in say /tmp/ wouldn't see it until the move or sym link creation.

You could possibly play around with user accounts and filesystem permissions to ensure that no other (non SYSTEM/root) program can see the file initially even if they tried to look wherever you hid it.

write(2)/read(2) atomicity between processes in linux

POSIX doesn't give any minimum guarantee of atomic operations for read and write except for writes on a pipe (where a write of up to PIPE_BUF (≥ 512) bytes is guaranteed to be atomic, but reads have no atomicity guarantee). The operation of read and write is described in terms of byte values; apart from pipes, a write operation offers no extra guarantees compared to a loop around single-byte write operations.

I'm not aware of any extra guarantee that Linux would give, neither 16 nor 512. In practice I'd expect it to depend on the kernel version, on the filesystem, and possibly on other factors such as the underlying block device, the number of CPUs, the CPU architecture, etc.

The O_SYNC, O_RSYNC and O_DSYNC guarantees (synchronized I/O data integrity completion, given for read and write in the optional SIO feature of POSIX) are not what you need. They guarantee that writes are committed to persistent storage before the read or write system call, but do not make any claim regarding a write that is started while the read operation is in progress.

In your scenario, reading and writing files doesn't look like the right toolset.

  • If you need to transfer only small amounts of data, use pipes. Don't worry too much about copying: copying data in memory is very fast on the scale of most processing, or of a context switch. Plus Linux is pretty good at optimizing copies.
  • If you need to transfer large amounts of data, you should probably be using some form of memory mapping: either a shared memory segment if disk backing isn't required, or mmap if it is. This doesn't magically solve the atomicity problem, but is likely to improve the performance of a proper synchronization mechanism. To perform synchronization, there are two basic approaches:

    • The producer writes data to shared memory, then sends a notification to the consumer indicating exactly what data is available. The consumer only processes data upon request. The notification may use the same channel (e.g. mmap + msync) or a different channel (e.g. pipe).
    • The producer writes data to shared memory, then flushes the write (e.g. msync). Then the producer writes a well-known value to one machine word (a sig_atomic_t will typically work, even though its atomicity is formally guaranteed only for signals — or, in practice, a uintptr_t). The consumer reads that one machine word and only processes the corresponding data if this word has an acceptable value.


Related Topics



Leave a reply



Submit