What Means "Atomic" System Call

What does atomic mean in programming?

Here's an example: Suppose foo is a variable of type long, then the following operation is not an atomic operation (in Java):

foo = 65465498L;

Indeed, the variable is written using two separate operations: one that writes the first 32 bits, and a second one which writes the last 32 bits. That means that another thread might read the value of foo, and see the intermediate state.

Making the operation atomic consists in using synchronization mechanisms in order to make sure that the operation is seen, from any other thread, as a single, atomic (i.e. not splittable in parts), operation. That means that any other thread, once the operation is made atomic, will either see the value of foo before the assignment, or after the assignment. But never the intermediate value.

A simple way of doing this is to make the variable volatile:

private volatile long foo;

Or to synchronize every access to the variable:

public synchronized void setFoo(long value) {
this.foo = value;
}

public synchronized long getFoo() {
return this.foo;
}
// no other use of foo outside of these two methods, unless also synchronized

Or to replace it with an AtomicLong:

private AtomicLong foo;

What are atomic operations for newbies?

Pretty much, yes. "Atom" comes from greek "atomos" = "uncuttable", and has been used in the sense "indivisible smallest unit" for a very long time (till physicists found that, in fact, there are smaller things than atoms). In concurrent programming, it means that there will be no context switch during it - nothing can affect the execution of atomic command.

An example: a web poll, open-ended questions, but we want to sum up how many people give the same answer. You have a database table where you insert answers and counts of that answer. The code is straightforward:

get the row for the given answer
if the row didn't exist:
create the row with answer and count 1
else:
increment count
update the row with new count

Or is it? See what happens when multiple people do it at the same time:

user A answers "ham and eggs"       user B answers "ham and eggs"
get the row: count is 1 get the row: count is 1
okay, we're updating! okay, we're updating!
count is now 2 count is now 2
store 2 for "ham and eggs" store 2 for "ham and eggs"

"Ham and eggs" only jumped by 1 even though 2 people voted for it! This is clearly not what we wanted. If only there was an atomic operation "increment if it exists or make a new record"... for brevity, let's call it "upsert" (for "update or insert")

user A answers "ham and eggs"       user B answers "ham and eggs"
upsert by incrementing count upsert by incrementing count

Here, each upsert is atomic: the first one left count at 2, the second one left it at 3. Everything works.

Note that "atomic" is contextual: in this case, the upsert operation only needs to be atomic with respect to operations on the answers table in the database; the computer can be free to do other things as long as they don't affect (or are affected by) the result of what upsert is trying to do.

Atomic syscall. Input/Output operations

Igor is right: just have one thread do all the log writes. Keep in mind that the kernel has to do locking to synchronize access to the open file descriptor (which keeps track of the file position), so by doing writes from multiple cores you're causing contention inside the kernel. Even worse, you're making system calls from multiple cores, which means the kernel's code / data accesses will dirty your caches on multiple cores.

See this paper for more about the impact of making system calls on the performance of user-space code after the syscall completes. (And about data / instruction cache misses inside the kernel for infrequent syscalls). It definitely makes sense to have one thread doing all the system calls, at least all the write system calls, to keep that part of your process's footprint isolated to one core. As well as the locking contention inside the kernel.

That FlexSC paper is about an idea for batching system calls to reduce user->kernel->user transitions, but they also measure overhead for the normal synchronous system-call method. More important is the discussion of cache-pollution from making system calls.


Alternatively, if you can let multiple threads write to your log file, you could just do that and not use the queue at all.

It's not guaranteed that a large write will finish uninterrupted, but a small to medium sized write should (almost?) always copy its whole buffer on most OSes. Especially if you're writing to a file, not a pipe. IDK how Linux write() behaves when it's preempted, but I expect it usually resumes to finish the write instead of returning without having written all the requested bytes. Partial writes might be more likely when interrupted by a signal.

It is guaranteed that bytes from two write() system calls won't be mixed together; all the bytes from one will be before or after the bytes from the other. You're correct that partial writes are a potential problem, though. I forget if the glibc syscall wrapper will resume the call for you on EINTR. Although in that case, it means no bytes actually got written, or it would have returned success with a byte count.

You should test this, for partial writes and for performance. kernel-space locking might be cheaper than the overhead of your lock-free queue, but making system calls from every thread that generates log messages might be worse for performance. (And when you test this, make sure you do it with some real work happening in your user-space process, not just a loop that only calls write.)

How does one programmatically determine if write system call is atomic on a particular file?

The write call as defined in POSIX has no atomicity guarantee at all. So you don't need to confirm anything, it's not atomic.

It doesn't even guarantee that the data will have reached the hard drive (if there is a drive at all) if it completes successfully. Successfully reading back the data doesn't give you any guarantees either.

You'll need to use the sync family of functions to get some durability guarantees.

Are POSIX' read() and write() system calls atomic?

I don't believe the text you cites implies anything of the sort. It doesn't even mention read() or write() or POSIX. In fact, read() and write() cannot be relied on to be atomic. The only thing POSIX says is that write() must be atomic if the size of the write is less than PIPE_BUF bytes, and even that only applies to pipes.

I didn't read the context around the part of the paper you cited, but it sounds like the passage you cited is stating constraints which must be placed on an implementation in order for the algorithm to work correctly. In other words, it states that an implementation of this algorithm requires locking.

How you do that locking is up to you (the implementor). If we are dealing with a regular file and multiple independent processes, you might try fcntl(F_SETLKW)-style locking. If your data structure is in memory and you are dealing with multiple threads in the same process, something else might be appropriate.

Is stat() atomic with respect to the file system

Yes, a stat call can be thought of as atomic, in that all the information it returns is guaranteed to be consistent. If you call stat at the same instant some other process is writing to the file, there should be no possibility that, say, the other process's write is reflected in st_mtime but not st_size.

And in any case, there's certainly no possibility that calling stat at the same instant some other process is writing to the file could cause that other process to fail. (That would be a serious and quite unacceptable bug in the operating system -- one of an OS'es main jobs is to ensure that unrelated processes can't accidentally interact with each other in such ways. This lack-of-interference property isn't usually what we mean by "atomic", though.)

With that said, though, the usual way to monitor a process is via its process ID. And there are probably plenty of prewritten packages out there to help you manage one or more processes that are supposed to run continuously, giving you clean start/stop and monitoring capabilities. (See s6 as an example. I know nothing about this package and am not recommending it; it's just the first one I came across in a web search.)

Another possibility, if you have any kind of IPC mechanism set up between your processes, is to set up a periodic heartbeat that each one publishes, so that a watchdog timer somewhere can detect a process dying.

If you want to keep monitoring your processes by the timeliness of the files they write, though, that sounds like a perfectly fine technique also.

Is a write operation in unix atomic?

To call the Posix semantics "atomic" is perhaps an oversimplification. Posix requires that reads and writes occur in some order:

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. (from the Rationale section of the Posix specification for pwrite and write)

The atomicity guarantee mentioned in APUE refers to the use of the O_APPEND flag, which forces writes to be performed at the end of the file:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

With respect to pread and pwrite, APUE says (correctly, of course) that these interfaces allow the application to seek and perform I/O atomically; in other words, that the I/O operation will occur at the specified file position regardless of what any other process does. (Because the position is specified in the call itself, and does not affect the persistent file position.)

The Posix sequencing guarantee is as follows (from the Description of the write() and pwrite() functions):

After a write() to a regular file has successfully returned:

  • Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

  • Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

As mentioned in the Rationale, this wording does guarantee that two simultaneous write calls (even in different unrelated processes) will not interleave data, because if data were interleaved during a write which will eventually succeed the second guarantee would be impossible to provide. How this is accomplished is up to the implementation.

It must be noted that not all filesystems conform to Posix, and modular OS design, which allows multiple filesystems to coexist in a single installation, make it impossible for the kernel itself to provide guarantees about write which apply to all available filesystems. Network filesystems are particularly prone to data races (and local mutexes won't help much either), as is mentioned as well by Posix (at the end of the paragraph quoted from the Rationale):

This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.

The first guarantee (about subsequent reads) requires some bookkeeping in the filesystem, because data which has been successfully "written" to a kernel buffer but not yet synched to disk must be made transparently available to processes reading from that file. This also requires some internal locking of kernel metadata.

Since writing to regular files is typically accomplished via kernel buffers and actually synching the data to the physical storage device is definitely not atomic, the locks necessary to provide these guarantee don't have to be very long-lasting. But they must be done inside the filesystem because nothing in the Posix wording limits the guarantees to simultaneous writes within a single threaded process.

Within a multithreaded process, Posix does require read(), write(), pread() and pwrite() to be atomic when they operate on regular files (or symbolic links). See Thread Interactions with Regular File Operations for a complete list of interfaces which must obey this requirement.

What does it mean to perform mutex locks functions atomically?

An atomic operation means that either it will be fully completed or not completed at all. The operation cannot be stopped/killed/ended in the MIDDLE.

Atomic operations are used in case of multithreaded programming mostly. These operations are used to keep the sanity of critical section/variable sane, as many threads race for their execution.

A good place to read about atomic operations and concurrency in C++ is "Concurrency in Action" by Anthony Williams



Related Topics



Leave a reply



Submit