Write data into text file with multiple threads (simultaneously, in different lines of file)
When you write to file, you need to move the pointer to the correct position. You need to call "seek" method in RandomAccessFile and then move the pointer by number of bytes. For example first thread will seek to 0, second will seek to 21 and so on.
The way your program is right now, every thread will overwrite every other thread.
There may also be a problem with parallelization.
I didn't want to give a ready made solution, but I got curious. So here's something you could learn from
import java.io.IOException;
import java.io.RandomAccessFile;
public class Blah {
// creates string before writing to the file
public static String createString(int integer){
StringBuilder result = new StringBuilder("");
for(int i = 0; i < 20; i++){
result.append(integer);
}
result.append("\n");
return result.toString();
}
public static void main(final String[] args) throws IOException {
RandomAccessFile file = new RandomAccessFile("part5.txt", "rw");
Blah blah = new Blah();
for(int i = 0; i <= 9; i++){
Thread thread = new Thread(blah.new NumberWriter(i, file));
thread.start();
}
file.close();
}
private class NumberWriter implements Runnable {
int number;
RandomAccessFile file;
public NumberWriter(int number, RandomAccessFile file) {
this.number = number;
this.file = file;
}
public void run() {
try {
int seek = this.number * 20 + number;
System.out.println("number is : " + number + " : seeking to : " + seek);
file.seek(seek);
file.write((createString(this.number)).getBytes());
} catch (IOException ioex) {
ioex.printStackTrace();
}
}
}
}
Writing to a file from multiple threads
Move the join inside the file.open block:
threads = []
File.open("test.txt", "a") do |fp|
500.times do |time|
threads << Thread.new do
fp.puts("#{time}: 1")
sleep(rand(100) / 100.0)
fp.puts("#{time}: 2")
end
end
threads.each{ |thread| thread.join }
end
Why? Thread.new
launches the thread, but it runs in parallel, and the life of the thread in your version isn't guaranteed to be shorter than the life of the file. File.open
closes the file after you exit the attached block. By waiting to close the file until after all the threads are done, everything will work as expected.
However, please note that this IS NOT thread safe on JRuby (or any other implementation without a GIL) and may have output intermixed:
6: 1
5: 17: 1
8: 1
3: 10: 110: 1
4: 11: 1
2: 19: 1
11: 1
12: 1
13: 1
14: 1
Note: this question appears to be from Ruby MRI 1.8.7 - File writing thread safety
How to create multiple threads that write to the same file in C#
Now that there's a code snippet, some optimization can be applied.
static void Main(string[] args)
{
var sw = new Stopwatch();
const int pow = 5;
sw.Start();
GenerateNumbers("test.txt", pow);
sw.Stop();
Console.WriteLine($"Wrote 10^{pow} lines of 10^{pow} numbers in {sw.Elapsed}");
}
public static void GenerateNumbers(string path, int pow)
{
var rnd = new Random();
using var sw = new StreamWriter(path, false);
var max = Math.Pow(10, pow);
var sb = new StringBuilder();
for (long i = 0; i < max; i++)
{
for (long j = 0; j < max; j++)
{
sb.Append(rnd.Next(1, 101));
sb.Append(' ');
}
sw.WriteLine(sb.ToString());
sb.Clear();
if (i % 100 == 0)
Console.WriteLine((i / max).ToString("P"));
}
}
The above code does IO writes at a fairly decent pace (remember the limit is the IO speed, not CPU / number generation). Also note that I'm running the code from inside a VM, so I'm likely not getting the best IO results.
- As mentioned by Neil Moss in the comments, you don't need to instantiate the
Random
class on each run. - I'm generating a single line to write in-memory using a
StringBuilder
, then I write this to the disk. - Since this does take a bit of time I've added a progress tracker (this adds a miniscule amount of overhead).
- A 10^4 lines of 10^4 numbers file already is 285MB in size and was generated in 4.6767592 seconds.
- A 10^5 case like the above yields a 25.5 GB file and takes 5:54.2580683 to generate.
I haven't tried this, but I'm wondering if you couldn't save time by writing the data to a ZIP file, assuming you're more interested in just getting the data onto the disk, and not the format itself. A compressed TXT file of numbers should be a fair-bit smaller and as such should be much faster to write.
Writing to a file from multiple threads in the correct order
To write lines in correct order,
- Use
for future in future_to_url
to iterate the futures in the submission order. - Use list comprehension
[execuotr.submit(...) for line in f]
instead of generator expression(execuotr.submit(...) for line in f)
. All lines are submitted to the executor at once. Otherwise, tasks are submitted on-demand one-by-one while the loop is iterated, which is not parallelized. findMatch()
return the result rather than write to the output directly.
When the call future.result()
is made, it returns immediately the result if available, or block and wait the result.
import concurrent.futures
import os
def main():
# Open File
for filename in os.listdir("files"):
with open('translate/' + filename, 'w', encoding='UTF-8') as outFile:
with open('files/' + filename, 'r', encoding='UTF-8') as f:
count = 0
# Replace Each Line
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# The following submit all lines
future_to_url = [executor.submit(findMatch, line, count) for line in f]
# as_completed return arbitrary future when it is done
# Use simple for-loop ensure the future are iterated sequentially
for future in future_to_url:
print(future.result())
# Uncomment to actually write to the output
# outFile.write(future.result())
def findMatch(line, count):
count = count + 1 # Keep track of lines for debugging
# Check if match in line
if (re.search(pattern1, line) != None):
# Translate each match in line. Depends on choice
for match in re.findall(pattern1, line):
# Filter out matches with no Japanese
if (re.search(pattern2, match) != None and '$' not in match):
if (choice == '1'):
match = match.rstrip()
print('Translating: ' + str(count) + ': ' + match)
translatedMatch = translate(match)
line = re.sub(match, translatedMatch, line, 1)
elif (choice == '2'):
match = match.rstrip()
print('Translating Line: ' + str(count))
line = translate(line)
break # Don't want dupes
else:
print('Bad Coder. Check your if statements')
return line
# Skip Line
else:
print('Skipping: ' + str(count))
return line
Related Topics
Spring Cron Expression for Every Day 1:01:Am
Reset Buffer with Bufferedreader in Java
How to Convert a Date to Milliseconds
What Does Super.Paintcomponent(G) Do
Javafx: "Toolkit" Not Initialized When Trying to Play an Mp3 File Through Mediaplayer Class
Reverse Java Graphics2D Scaled and Rotated Coordinates
Java Two Varargs in One Method
Equivalent of Waitforvisible/Waitforelementpresent in Selenium Webdriver Tests Using Java
How to Select an Item from a Dropdown List Using Selenium Webdriver with Java
Determine If a Java Application Is in Debug Mode in Eclipse
Is the in Relation in Cassandra Bad for Queries
Best Practice to Use Httpclient in Multithreaded Environment
Can a Spring Boot @Restcontroller Be Enabled/Disabled Using Properties
How to Convert a String to a Secretkey
How to Get a Client's MAC Address from Httpservlet