"Thread Safety" of Appending to Single File from Multiple Processes

Thread safety of appending to single file from multiple processes?

The question depends on what type of write is going on. If you are using standard I/O with buffering, which is typically most program's default, then the buffer will only be flushed after several lines have been written and will when flushed will not necessarily be a integral number of lines. If you are using write(2) or have changed the default stdio buffering to be line or unbuffered, then it will PROBABLY be interleaved correctly as long as the lines are reasonable sized (certainly if lines are less than 512 bytes).

Is write() safe to be called from multiple threads simultaneously?

Solaris 10 claims to be POSIX compliant. The write() function is not among the handful of system interfaces that POSIX permits to be non-thread-safe, so we can conclude that that on Solaris 10, it is safe in a general sense to call write() simultaneously from two or more threads.

POSIX also designates write() among those functions whose effects are atomic relative to each other when they operate on regular files or symbolic links. Specifically, it says that

If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them.

If your writes were directed to a regular file then that would be sufficient to conclude that your proposed multi-thread actions are safe, in the sense that they would not interfere with one another, and the data written in one call would not be commingled with that written by a different call in any thread. Unfortunately, /dev/poll is not a regular file, so that does not apply directly to you.

You should also be aware that write() is not in general required to transfer the full number of bytes specified in a single call. For general purposes, one must therefore be prepared to transfer the desired bytes over multiple calls, by using a loop. Solaris may provide applicable guarantees beyond those expressed by POSIX, perhaps specific to the destination device, but absent such guarantees it is conceivable that one of your threads performs a partial write, and the next write is performed by a different thread. That very likely would not produce the results you want or expect.

Safe to have multiple processes writing to the same file at the same time? [CentOs 6, ext4]

What you're doing seems perfectly OK, provided you're using the POSIX "raw" IO syscalls such as read(), write(), lseek() and so forth.

If you use C stdio (fread(), fwrite() and friends) or some other language runtime library which has its own userspace buffering, then the answer by "Tilo" is relevant, in that due to the buffering, which is to some extent outside your control, the different processes might overwrite each other's data.

Wrt OS locking, while POSIX states that writes or reads less than of size PIPE_BUF are atomic for some special files (pipes and FIFO's), there is no such guarantee for regular files. In practice, I think it's likely that IO's within a page are atomic, but there is no such guarantee. The OS only does locking internally to the extent that is necessary to protect its own internal data structures. One can use file locks, or some other interprocess communication mechanism, to serialize access to files. But, all this is relevant only of you have several processes doing IO to the same region of a file. In your case, as your processes are doing IO to disjoint sections of the file, none of this matters, and you should be fine.

Is it safe for two threads to write identical content to the same file?

is it safe for multiple threads (or processes) to attempt this simultaneously, for the same calculation, and with the same output file?

It is a race condition when multiple threads try to write into the same file, so that you may end up with a corrupted file. There is no guarantee that ofstream::write is atomic and that depends on a particular filesystem.

The robust solution for your problem (works both with multiple threads and/or processes):

  1. Write into a temporary file with a unique name in the destination directory (so that the temporary and the final files are in the same filesystem for rename to not move data).
  2. rename the temporary file to its final name. It replaces the existing file if one is there. Non-portable renameat2 is more flexible.

Python - appending to same file from multiple threads

The solution is to write to the file in one thread only.

import Queue  # or queue in Python 3
import threading

class PrintThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue

def printfiles(self, p):
for path, dirs, files in os.walk(p):
for f in files:
print(f, file=output)

def run(self):
while True:
result = self.queue.get()
self.printfiles(result)
self.queue.task_done()

class ProcessThread(threading.Thread):
def __init__(self, in_queue, out_queue):
threading.Thread.__init__(self)
self.in_queue = in_queue
self.out_queue = out_queue

def run(self):
while True:
path = self.in_queue.get()
result = self.process(path)
self.out_queue.put(result)
self.in_queue.task_done()

def process(self, path):
# Do the processing job here

pathqueue = Queue.Queue()
resultqueue = Queue.Queue()
paths = getThisFromSomeWhere()

output = codecs.open('file', 'a')

# spawn threads to process
for i in range(0, 5):
t = ProcessThread(pathqueue, resultqueue)
t.setDaemon(True)
t.start()

# spawn threads to print
t = PrintThread(resultqueue)
t.setDaemon(True)
t.start()

# add paths to queue
for path in paths:
pathqueue.put(path)

# wait for queue to get empty
pathqueue.join()
resultqueue.join()

Process and Thread Safe way to write to multiple files

In my project I had about 10 processes that were writing to some common logs. I kept getting the UnauthorizedAccessException: Access to the path 'Global\mymutexid' is denied. intermittently. So I ended up changing the constructor as below (idea from here) :

public static bool WriteToFileProcessAndThreadSafe(string filePath, string fileName, string logText)
{
//Mutex LogWriteLocker = new Mutex(false, "Global\\" + fileName); // causes UnauthorizedAccessException
Mutex LogWriteLocker = null;

try
{
MutexSecurity mutexSecurity = new MutexSecurity();
mutexSecurity.AddAccessRule(new MutexAccessRule(new SecurityIdentifier(WellKnownSidType.WorldSid, null), MutexRights.Synchronize | MutexRights.Modify, AccessControlType.Allow));

// attempt to create the mutex, with the desired DACL..
//The backslash (\) is a reserved character in a mutex name.
//The name can be no more than 260 characters in length.
LogWriteLocker = new Mutex(false, "Global\\" + fileName, out _, mutexSecurity);

try
{
LogWriteLocker.WaitOne();
}
catch (AbandonedMutexException amex)
{
//Handle AbandonedMutexException;
}

//We are now thread and process safe. Do the work.
File.AppendAllText(Path.Combine(filePath, fileName), logText);

return true;
}
catch (WaitHandleCannotBeOpenedException)
{
// the mutex cannot be opened, probably because a Win32 object of a different
// type with the same name already exists.
return false;
}
catch (UnauthorizedAccessException)
{
// the mutex exists, but the current process or thread token does not
// have permission to open the mutex with SYNCHRONIZE | MUTEX_MODIFY rights.
return false;
}
catch (Exception exx)
{
//Handle other exceptions
return false;
}
finally
{
if(LogWriteLocker != null)
{
LogWriteLocker.ReleaseMutex();
LogWriteLocker.Dispose();
}
}
}

I have finally understood that my mistake was thinking of this in terms of "milti-threading" were as the issue was due to "multi-process". All the variations I have listed in the OP end up with lock() at the core, which works only for single process, multi thread situation.

Thanks to @Eldar and just an addition to @Charlieface answer, here is what I am using for my cause. I use the fileName as mutex name so that each file has a separate lock (as opposed to using one lock for all files)

fopen multiple times in append mode

MacOS, FreeBSD and Linux are all POSIX systems. As such each FILE* will have its own user-space buffer (or none if you disable it), and once that buffer is flushed it will be written to the underlying file descriptor. POSIX guarantees that append opened file descriptor writes are atomic, thus no data will be lost. As long as your data isn't split across multiple flushes it won't interleave with each other either.

Write thread-safe to file in python

We used the logging module:

import logging

logpath = "/tmp/log.log"
logger = logging.getLogger('log')
logger.setLevel(logging.INFO)
ch = logging.FileHandler(logpath)
ch.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(ch)

def application(env, start_response):
logger.info("%s %s".format("hello","world!"))
start_response('200 OK', [('Content-Type', 'text/html')])
return ["Hello!"]

Understanding concurrent file writes from multiple processes

Atomicity of writes less than PIPE_BUF applies only to pipes and FIFOs. For file writes, POSIX says:

This volume of POSIX.1-2008 does not specify behavior of concurrent
writes to a file from multiple processes. Applications should use some
form of concurrency control.

...which means that you're on your own - different UNIX-likes will give different guarantees.



Related Topics



Leave a reply



Submit