Fastest Way to Write Huge Data in Text File Java

Fastest way to write huge data in text file Java

You might try removing the BufferedWriter and just using the FileWriter directly. On a modern system there's a good chance you're just writing to the drive's cache memory anyway.

It takes me in the range of 4-5 seconds to write 175MB (4 million strings) -- this is on a dual-core 2.4GHz Dell running Windows XP with an 80GB, 7200-RPM Hitachi disk.

Can you isolate how much of the time is record retrieval and how much is file writing?

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;

public class FileWritingPerfTest {


private static final int ITERATIONS = 5;
private static final double MEG = (Math.pow(1024, 2));
private static final int RECORD_COUNT = 4000000;
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n";
private static final int RECSIZE = RECORD.getBytes().length;

public static void main(String[] args) throws Exception {
List<String> records = new ArrayList<String>(RECORD_COUNT);
int size = 0;
for (int i = 0; i < RECORD_COUNT; i++) {
records.add(RECORD);
size += RECSIZE;
}
System.out.println(records.size() + " 'records'");
System.out.println(size / MEG + " MB");

for (int i = 0; i < ITERATIONS; i++) {
System.out.println("\nIteration " + i);

writeRaw(records);
writeBuffered(records, 8192);
writeBuffered(records, (int) MEG);
writeBuffered(records, 4 * (int) MEG);
}
}

private static void writeRaw(List<String> records) throws IOException {
File file = File.createTempFile("foo", ".txt");
try {
FileWriter writer = new FileWriter(file);
System.out.print("Writing raw... ");
write(records, writer);
} finally {
// comment this out if you want to inspect the files afterward
file.delete();
}
}

private static void writeBuffered(List<String> records, int bufSize) throws IOException {
File file = File.createTempFile("foo", ".txt");
try {
FileWriter writer = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize);

System.out.print("Writing buffered (buffer size: " + bufSize + ")... ");
write(records, bufferedWriter);
} finally {
// comment this out if you want to inspect the files afterward
file.delete();
}
}

private static void write(List<String> records, Writer writer) throws IOException {
long start = System.currentTimeMillis();
for (String record: records) {
writer.write(record);
}
// writer.flush(); // close() should take care of this
writer.close();
long end = System.currentTimeMillis();
System.out.println((end - start) / 1000f + " seconds");
}
}

Fastest Way To Read and Write Large Files Line By Line in Java

I suspect your real problem is that you have limited hardware and what you do is software won't make much difference. If you have plenty of memory and CPU, more advanced tricks can help, but if you are just waiting on your hard drive because the file is not cached, it won't make much difference.

BTW: 500 MB in 10 secs or 50 MB/sec is a typical read speed for a HDD.

Try running the following to see at what point your system is unable to cache the file efficiently.

public static void main(String... args) throws IOException {
for (int mb : new int[]{50, 100, 250, 500, 1000, 2000})
testFileSize(mb);
}

private static void testFileSize(int mb) throws IOException {
File file = File.createTempFile("test", ".txt");
file.deleteOnExit();
char[] chars = new char[1024];
Arrays.fill(chars, 'A');
String longLine = new String(chars);
long start1 = System.nanoTime();
PrintWriter pw = new PrintWriter(new FileWriter(file));
for (int i = 0; i < mb * 1024; i++)
pw.println(longLine);
pw.close();
long time1 = System.nanoTime() - start1;
System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n",
time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1);

long start2 = System.nanoTime();
BufferedReader br = new BufferedReader(new FileReader(file));
for (String line; (line = br.readLine()) != null; ) {
}
br.close();
long time2 = System.nanoTime() - start2;
System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n",
time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2);
file.delete();
}

On a Linux machine with lots of memory.

Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s
Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s
Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s
Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s
Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s
Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s
Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s
Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s
Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s
Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s
Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s
Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s

On a windows machine with lots of memory.

Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s
Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s
Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s
Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s
Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s
Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s
Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s
Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s
Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s
Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s
Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s
Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s

Faster Way to write files in Java

Move the FileWriter open and close outside the loop,

FileWriter fw = new FileWriter(filename,true); // <-- here!
while(data.hasNextLine()){
String line = data.nextLine();
String[] split = line.split("\t");
String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "
+ split[1]+ ".txt";
//System.out.println((filename));
//System.out.println(line);
// FileWriter fw = new FileWriter(filename,true);

Otherwise it has to open the file and seek to the end for every line of input!

Edit

I noticed you don't have the filename until in your loop. Let's use a Map to keep a cache.

FileWriter fw = null;
Map<String, FileWriter> map = new HashMap<>();
while (data.hasNextLine()) {
String line = data.nextLine();
String[] split = line.split("\t");
String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "
+ split[1] + ".txt";
// System.out.println((filename));
// System.out.println(line);
if (map.containsKey(filename)) {
fw = map.get(filename);
} else {
fw = new FileWriter(filename, true);
map.put(filename, fw);
}
// ...
}
for (FileWriter file : map.values()) {
file.close();
}

How to append/write huge data file text in Java

You don’t need a PrintWriter here. If you have whatever kind of Writer (e.g. a FileWriter) you can simply invoke append(sb) on it. And you don’t need to flush, close implies flushing.

private static void write(StringBuilder sb, Boolean append) throws Exception {
File file = File.createTempFile("foo", ".txt");

try(FileWriter writer = new FileWriter(file.getAbsoluteFile(), append)) {
writer.append(sb);
}
}

On my system I encountered a small performance improvement using a Channel rather than an OutputStream:

private static void write0a(StringBuilder sb, Boolean append) throws Exception {
File file = File.createTempFile("foo", ".txt");

try(Writer writer = Channels.newWriter(new FileOutputStream(
file.getAbsoluteFile(), append).getChannel(), "UTF-8")) {
writer.append(sb);
}
}

However these are only slight improvements. I don’t see much possibilities here as all the code ends up calling the same routines. What could really improve your performance is keeping the Writer alive during the invocations and not flushing every record.

Fastest way to write to file?

Make sure you allocate a large enough buffer:

BufferedWriter out = new BufferedWriter(new FileWriter(file), 32768);

What sort of OS are you running on? That can make a big difference too. However, taking a minute to write out a file of less-than-enormous size sounds like a system problem. On Linux or other *ix systems, you can use things like strace to see if the JVM is making lots of unnecessary system calls. (A very long time ago, Java I/O was pretty dumb and would make insane numbers of low-level write() system calls if you weren't careful, but when I say "a long time ago" I mean 1998 or so.)

edit — note that the situation of a Java program writing a simple file in a simple way, and yet being really slow, is an inherently odd one. Can you tell if the CPU is heavily loaded while the file is being written? It shouldn't be; there should be almost no CPU load from such a thing.

When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow

Maybe this can help you

Fastest way to write huge data in text file Java

https://www.quora.com/How-do-to-read-and-write-large-size-file-in-Java-efficiently

What is the best way to write and append a large file in java

1) You are opening a new writer every time, without closing the previous writer object.

2) Don't open the file for each write operation, instead open it before the loop, and close it after the loop.

BufferedWriter writer = new BufferedWriter(new FileWriter(path, true));
do{
String resultData = HTTP.GET <uri>;
writer.write(resultData + "\n");
}while(resultData.exists());
writer.close();

3) Default buffered size of BufferedWriter is 8192 characters, Since you have 4 GB of data, I would increase the buffer size, to improve the performance but at the same time make sure your JVM has enough memory to hold the data.

BufferedWriter writer = new BufferedWriter(new FileWriter(path, true), 8192 * 4);
do{
String resultData = HTTP.GET <uri>;
writer.write(resultData + "\n");
}while(resultData.exists());
writer.close();

4) Since you are making a GET web service call, the performance depends on the response time of webservice also.

Fast way to write millions of small text files in Java?

This is almost certainly an OS filesystem issue; writing lots of files simply is slow. I recommend writing a comparison test in shell and in C to get an idea of how much the OS is contributing. Additionally, I would suggest two major tweaks:

  • Ensure the system this is running on is using an SSD. Latency from seeking for filesystem journaling will be a major source of overhead.
  • Multithread your writing process. Serialized, the OS can't perform optimizations like batch operation writing, and the FileWriter may block on the close() operation.

(I was going to suggest looking into NIO, but the APIs don't seem to offer much benefit for your situation, since setting up an mmapped buffer would probably introduce more overhead than it would save for this size.)



Related Topics



Leave a reply



Submit