Threads and File Writing

Write data into text file with multiple threads (simultaneously, in different lines of file)

When you write to file, you need to move the pointer to the correct position. You need to call "seek" method in RandomAccessFile and then move the pointer by number of bytes. For example first thread will seek to 0, second will seek to 21 and so on.

The way your program is right now, every thread will overwrite every other thread.

There may also be a problem with parallelization.

I didn't want to give a ready made solution, but I got curious. So here's something you could learn from

import java.io.IOException;
import java.io.RandomAccessFile;

public class Blah {
// creates string before writing to the file
public static String createString(int integer){
StringBuilder result = new StringBuilder("");

for(int i = 0; i < 20; i++){
result.append(integer);
}

result.append("\n");
return result.toString();
}

public static void main(final String[] args) throws IOException {
RandomAccessFile file = new RandomAccessFile("part5.txt", "rw");
Blah blah = new Blah();
for(int i = 0; i <= 9; i++){
Thread thread = new Thread(blah.new NumberWriter(i, file));
thread.start();
}

file.close();
}

private class NumberWriter implements Runnable {
int number;
RandomAccessFile file;

public NumberWriter(int number, RandomAccessFile file) {
this.number = number;
this.file = file;
}

public void run() {
try {
int seek = this.number * 20 + number;
System.out.println("number is : " + number + " : seeking to : " + seek);

file.seek(seek);
file.write((createString(this.number)).getBytes());
} catch (IOException ioex) {
ioex.printStackTrace();
}
}
}
}

Writing to a file from multiple threads

Move the join inside the file.open block:

threads = []
File.open("test.txt", "a") do |fp|
500.times do |time|
threads << Thread.new do
fp.puts("#{time}: 1")
sleep(rand(100) / 100.0)
fp.puts("#{time}: 2")
end
end

threads.each{ |thread| thread.join }
end

Why? Thread.new launches the thread, but it runs in parallel, and the life of the thread in your version isn't guaranteed to be shorter than the life of the file. File.open closes the file after you exit the attached block. By waiting to close the file until after all the threads are done, everything will work as expected.

However, please note that this IS NOT thread safe on JRuby (or any other implementation without a GIL) and may have output intermixed:

6: 1
5: 17: 1
8: 1

3: 10: 110: 1
4: 11: 1
2: 19: 1

11: 1

12: 1
13: 1
14: 1

Note: this question appears to be from Ruby MRI 1.8.7 - File writing thread safety

How to create multiple threads that write to the same file in C#

Now that there's a code snippet, some optimization can be applied.

static void Main(string[] args)
{
var sw = new Stopwatch();
const int pow = 5;
sw.Start();
GenerateNumbers("test.txt", pow);
sw.Stop();
Console.WriteLine($"Wrote 10^{pow} lines of 10^{pow} numbers in {sw.Elapsed}");
}

public static void GenerateNumbers(string path, int pow)
{
var rnd = new Random();
using var sw = new StreamWriter(path, false);
var max = Math.Pow(10, pow);
var sb = new StringBuilder();
for (long i = 0; i < max; i++)
{
for (long j = 0; j < max; j++)
{
sb.Append(rnd.Next(1, 101));
sb.Append(' ');
}
sw.WriteLine(sb.ToString());
sb.Clear();
if (i % 100 == 0)
Console.WriteLine((i / max).ToString("P"));
}
}

The above code does IO writes at a fairly decent pace (remember the limit is the IO speed, not CPU / number generation). Also note that I'm running the code from inside a VM, so I'm likely not getting the best IO results.

Resource Monitor

  • As mentioned by Neil Moss in the comments, you don't need to instantiate the Random class on each run.
  • I'm generating a single line to write in-memory using a StringBuilder, then I write this to the disk.
  • Since this does take a bit of time I've added a progress tracker (this adds a miniscule amount of overhead).
  • A 10^4 lines of 10^4 numbers file already is 285MB in size and was generated in 4.6767592 seconds.
  • A 10^5 case like the above yields a 25.5 GB file and takes 5:54.2580683 to generate.

I haven't tried this, but I'm wondering if you couldn't save time by writing the data to a ZIP file, assuming you're more interested in just getting the data onto the disk, and not the format itself. A compressed TXT file of numbers should be a fair-bit smaller and as such should be much faster to write.

Writing to a file from multiple threads in the correct order

To write lines in correct order,

  1. Use for future in future_to_url to iterate the futures in the submission order.
  2. Use list comprehension [execuotr.submit(...) for line in f] instead of generator expression (execuotr.submit(...) for line in f). All lines are submitted to the executor at once. Otherwise, tasks are submitted on-demand one-by-one while the loop is iterated, which is not parallelized.
  3. findMatch() return the result rather than write to the output directly.

When the call future.result() is made, it returns immediately the result if available, or block and wait the result.

import concurrent.futures
import os

def main():
# Open File
for filename in os.listdir("files"):
with open('translate/' + filename, 'w', encoding='UTF-8') as outFile:
with open('files/' + filename, 'r', encoding='UTF-8') as f:
count = 0

# Replace Each Line
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:

# The following submit all lines
future_to_url = [executor.submit(findMatch, line, count) for line in f]

# as_completed return arbitrary future when it is done
# Use simple for-loop ensure the future are iterated sequentially
for future in future_to_url:
print(future.result())
# Uncomment to actually write to the output
# outFile.write(future.result())

def findMatch(line, count):
count = count + 1 # Keep track of lines for debugging
# Check if match in line
if (re.search(pattern1, line) != None):

# Translate each match in line. Depends on choice
for match in re.findall(pattern1, line):

# Filter out matches with no Japanese
if (re.search(pattern2, match) != None and '$' not in match):
if (choice == '1'):
match = match.rstrip()
print('Translating: ' + str(count) + ': ' + match)
translatedMatch = translate(match)
line = re.sub(match, translatedMatch, line, 1)

elif (choice == '2'):
match = match.rstrip()
print('Translating Line: ' + str(count))
line = translate(line)
break # Don't want dupes

else:
print('Bad Coder. Check your if statements')

return line
# Skip Line
else:
print('Skipping: ' + str(count))

return line



Related Topics



Leave a reply



Submit