Creating a Thread-Safe Temporary File Name

Creating a thread-safe temporary file name


Dir::Tmpname.create

You could use Dir::Tmpname.create. It figures out what temporary directory to use (unless you pass it a directory). It's a little ugly to use given that it expects a block:

require 'tmpdir'
# => true
Dir::Tmpname.create(['prefix-', '.ext']) {}
# => "/tmp/prefix-20190827-1-87n9iu.ext"
Dir::Tmpname.create(['prefix-', '.ext'], '/my/custom/directory') {}
# => "/my/custom/directory/prefix-20190827-1-11x2u0h.ext"

The block is there for code to test if the file exists and raise an Errno::EEXIST so that a new name can be generated with incrementing value appended on the end.

The Rails Solution

The solution implemented by Ruby on Rails is short and similar to the solution originally implemented in Ruby:

require 'tmpdir'
# => true
File.join(Dir.tmpdir, "YOUR_PREFIX-#{Time.now.strftime("%Y%m%d")}-#{$$}-#{rand(0x100000000).to_s(36)}-YOUR_SUFFIX")
=> "/tmp/YOUR_PREFIX-20190827-1-wyouwg-YOUR_SUFFIX"
File.join(Dir.tmpdir, "YOUR_PREFIX-#{Time.now.strftime("%Y%m%d")}-#{$$}-#{rand(0x100000000).to_s(36)}-YOUR_SUFFIX")
=> "/tmp/YOUR_PREFIX-20190827-1-140far-YOUR_SUFFIX"

Dir::Tmpname.make_tmpname (Ruby 2.5.0 and earlier)

Dir::Tmpname.make_tmpname was removed in Ruby 2.5.0. Prior to Ruby 2.4.4 it could accept a directory path as a prefix, but as of Ruby 2.4.4, directory separators are removed.

Digging in tempfile.rb you'll notice that Tempfile includes Dir::Tmpname. Inside you'll find make_tmpname which does what you ask for.

require 'tmpdir'
# => true
File.join(Dir.tmpdir, Dir::Tmpname.make_tmpname("prefix-", nil))
# => "/tmp/prefix-20190827-1-dfhvld"
File.join(Dir.tmpdir, Dir::Tmpname.make_tmpname(["prefix-", ".ext"], nil))
# => "/tmp/prefix-20190827-1-19zjck1.ext"
File.join(Dir.tmpdir, Dir::Tmpname.make_tmpname(["prefix-", ".ext"], "suffix"))
# => "/tmp/prefix-20190827-1-f5ipo7-suffix.ext"

Threadsafe and fault-tolerant file writes

You can use Python's tempfile module to give you a temporary file name. It can create a temporary file in a thread safe manner rather than making one up using time.time() which may return the same name if used in multiple threads at the same time.

As suggested in a comment to your question, this can be coupled with the use of a context manager. You can get some ideas of how to implement what you want to do by looking at Python tempfile.py sources.

The following code snippet may do what you want. It uses some of the internals of the objects returned from tempfile.

  • Creation of temporary files is thread safe.
  • Renaming of files upon successful completion is atomic, at least on Linux. There isn't a separate check between os.path.exists() and the os.rename() which could introduce a race condition. For an atomic rename on Linux the source and destinations must be on the same file system which is why this code places the temporary file in the same directory as the destination file.
  • The RenamedTemporaryFile class should behave like a NamedTemporaryFile for most purposes except when it is closed using the context manager, the file is renamed.

Sample:

import tempfile
import os

class RenamedTemporaryFile(object):
"""
A temporary file object which will be renamed to the specified
path on exit.
"""
def __init__(self, final_path, **kwargs):
tmpfile_dir = kwargs.pop('dir', None)

# Put temporary file in the same directory as the location for the
# final file so that an atomic move into place can occur.

if tmpfile_dir is None:
tmpfile_dir = os.path.dirname(final_path)

self.tmpfile = tempfile.NamedTemporaryFile(dir=tmpfile_dir, **kwargs)
self.final_path = final_path

def __getattr__(self, attr):
"""
Delegate attribute access to the underlying temporary file object.
"""
return getattr(self.tmpfile, attr)

def __enter__(self):
self.tmpfile.__enter__()
return self

def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is None:
self.tmpfile.delete = False
result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)
os.rename(self.tmpfile.name, self.final_path)
else:
result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)

return result

You can then use it like this:

with RenamedTemporaryFile('whatever') as f:
f.write('stuff')

During writing, the contents go to a temporary file, on exit the file is renamed. This code will probably need some tweaks but the general idea should help you get started.

Creating thread-safe non-deleting unique filenames in ruby/rails

I actually found the answer after some digging. Of course the obvious approach is to see what Tempfile itself does. I just assumed it was native code, but it is not. The source for 1.8.7 can be found here for instance.

As you can see, Tempfile uses an apparently undocumented file mode of File::EXCL. So my code can be simplified substantially:

# make a unique filename
time = Time.now
filename = "#{time.to_i}-#{sprintf('%06d', time.usec)}"

data_file = nil
count = 1
loop do
begin
data_file = File.open(File.join(UPLOAD_BASE, "#{filename}-#{count}.data"), File::RDWR|File::CREAT|File::EXCL)
break
rescue Errno::EEXIST
count += 1
end
end

# ... write to data_file and close it ...

UPDATE And now I see that this is covered in a prior thread:

How do open a file for writing only if it doesn't already exist in ruby

So maybe this whole question should be marked a duplicate.

Is createTempFile thread-safe?

Best way to get your answer is to look at the source code. At first there isn't any synchronization in createTempFile, but to generate the temp file name, it is using SecureRandom which is ThreadSafe.
Then unless you are really unlucky, your file will always get a different name.

On top of that, createTempFile implementation is looping, generating new file name, until the file has been created. The file creation of course is delegated to the native file system operation which we may assume is threadsafe..

C/C++ Thread-safety of tmpnam?

tmpnam only guarantees that the file did not exist at the time - but it may be created before you can do so yourself. To use it safely, you will ALWAYS need to then attempt to create the file with open (filename, O_CREAT | O_EXCL | O_NOFOLLOW). If this fails due to EEXIST or ELOOP, go back and try a new name.

This is particularly important to protect against symlink attacks, where another program creates a symlink from your temp file name to /etc/passwd or some other important file.

Also, make sure you do not pass NULL to tmpnam, as the buffer used then is the same for all threads.

Another approach which combines these is to use mkstemp() or mkostemp(), which will create the file safely for you.

Finally, if you don't need the filename, you can use tmpfile(), which will create a temporary file that will be deleted on close.

Java Temporary File Multithreaded Application

The answer posted at the below URL answers my question. The method I posted is safe in a multithreaded single JVM process environment. To make it safe in a multithreaded multi-JVM process environment (e.g. a clustered web app) you can use Chris Cooper's idea which involves passing a unique value in the prefix argument for the File.createTempFile method within each JVM process.

Is createTempFile thread-safe?

How to make writing method thread safe?

I am not well versed in Java so I am going to provide a language-agnostic answer.

What you want to do is to transform matrices into results, then format them as string and finally write them all into the stream.

Currently you are writing into the stream as soon as you process each result, so when you add multi threads to your logic you end up with racing conditions in your stream.

You already figured out that only the calls for ResultGenerator.getResult() should be done in parallel whilst the stream still need to be accessed sequentially.

Now you only need to put this in practice. Do it in order:

  • Build a list where each item is what you need to generate a result
  • Process this list in parallel thus generating all results (this is a map operation). Your list of items will become a list of results.
  • Now you already have your results so you can iterate over them sequentially to format and write them into the stream.

I suspect the Java 8 provides some tools to make everything in a functional-way, but as said I am not a Java guy so I cannot provide code samples. I hope this explanation will suffice.

@edit

This sample code in F# explains what I meant.

open System

// This is a pretty long and nasty operation!
let getResult doc =
Threading.Thread.Sleep(1000)
doc * 10

// This is writing into stdout, but it could be a stream...
let formatAndPrint =
printfn "Got result: %O"

[<EntryPoint>]
let main argv =
printfn "Starting..."

[| 1 .. 10 |] // A list with some docs to be processed
|> Array.Parallel.map getResult // Now that's doing the trick
|> Array.iter formatAndPrint

0

Create file in a thread-safe manner

A possible, slightly ugly solution would be to lock on a lock file and then testing if the file exists:

$lock = fopen("/tmp/".$filename."LOCK", "w"); // A

if (!flock($lock, LOCK_EX)) { // B
continue;
}
if(!file_exists($filename)){ // C
//File doesn't exist so we know that this thread will create it
//Do stuff to $filename
flock($lock, LOCK_UN); // D
fclose($lock);
}else{
//File exists. This thread didn't create it (at least in this iteration).
flock($lock, LOCK_UN);
fclose($lock);
}

This should allow exclusive access to the file and also allows deciding whether the call to fopen($VMidFile, 'c'); will create the file.

How can I create a temp file with a specific extension with .NET?

Guaranteed to be (statistically) unique:

string fileName = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".csv"; 

(To quote from the wiki article on the probabilty of a collision:

...one's annual risk of being hit by a
meteorite is estimated to be one
chance in 17 billion [19], that means
the probability is about 0.00000000006
(6 × 10−11), equivalent to the odds of
creating a few tens of trillions of
UUIDs in a year and having one
duplicate. In other words, only after
generating 1 billion UUIDs every
second for the next 100 years, the
probability of creating just one
duplicate would be about 50%. The
probability of one duplicate would be
about 50% if every person on earth
owns 600 million UUIDs

EDIT: Please also see JaredPar's comments.



Related Topics



Leave a reply



Submit