Java Nio Filechannel Versus Fileoutputstream Performance/Usefulness

Java NIO FileChannel versus FileOutputstream performance / usefulness

My experience with larger files sizes has been that java.nio is faster than java.io. Solidly faster. Like in the >250% range. That said, I am eliminating obvious bottlenecks, which I suggest your micro-benchmark might suffer from. Potential areas for investigating:

The buffer size. The algorithm you basically have is

  • copy from disk to buffer
  • copy from buffer to disk

My own experience has been that this buffer size is ripe for tuning. I've settled on 4KB for one part of my application, 256KB for another. I suspect your code is suffering with such a large buffer. Run some benchmarks with buffers of 1KB, 2KB, 4KB, 8KB, 16KB, 32KB and 64KB to prove it to yourself.

Don't perform java benchmarks that read and write to the same disk.

If you do, then you are really benchmarking the disk, and not Java. I would also suggest that if your CPU is not busy, then you are probably experiencing some other bottleneck.

Don't use a buffer if you don't need to.

Why copy to memory if your target is another disk or a NIC? With larger files, the latency incured is non-trivial.

Like other have said, use FileChannel.transferTo() or FileChannel.transferFrom(). The key advantage here is that the JVM uses the OS's access to DMA (Direct Memory Access), if present. (This is implementation dependent, but modern Sun and IBM versions on general purpose CPUs are good to go.) What happens is the data goes straight to/from disc, to the bus, and then to the destination... bypassing any circuit through RAM or the CPU.

The web app I spent my days and night working on is very IO heavy. I've done micro benchmarks and real-world benchmarks too. And the results are up on my blog, have a look-see:

  • Real world performance metrics: java.io vs. java.nio
  • Real world performance metrics: java.io vs. java.nio (The Sequel)

Use production data and environments

Micro-benchmarks are prone to distortion. If you can, make the effort to gather data from exactly what you plan to do, with the load you expect, on the hardware you expect.

My benchmarks are solid and reliable because they took place on a production system, a beefy system, a system under load, gathered in logs. Not my notebook's 7200 RPM 2.5" SATA drive while I watched intensely as the JVM work my hard disc.

What are you running on? It matters.

FileInput/OutputStream versus FileChannels -- which gives better performance

Input and Output Streams assume a stream styled access to the file or resource. There are a few extra items which help (array reads) but the basic idea is that of a stream where you read in one or more characters at a time (possibly blocking until you have more characters available).

Channels are the means to copy information into Buffers. This provides a lower level of access to input and output routines. With thoughtful buffer sizing, the speed-ups can be impressive. Structuring your code around buffers can reduce the time spent in a read loop (also increasing performance). Finally, while it is possible to do pre-checking of input stream state in an attempt to avoid blocking, Channels and Buffers allow operations to perform in a non-blocking manner (even in the worst conditions).

When Use FileChannel to read()/write() files?

There are only two cases where a FileChannel is faster than a FileInputStream or FileOutputStream.

The first is when you can use an off-heap ("direct") ByteBuffer to hold data, so that it isn't copied into the Java heap. For example, if you were writing a web-server that delivered static files to a socket, it would be faster to use a FileInputStream and a SocketChannel rather than a FileInputStream and a SocketOutputStream.

These cases are, in my opinion, very few and far between. Normally when you read (or write) a file in Java you will be doing something with the data. In which case, you can't avoid copying the data onto the heap.

The other use for a FileChannel is to create a MappedByteBuffer for random access to the contents of a file. This is significantly faster than using RandomAccessFile because it replaces explicit calls to the OS kernel with memory accesses that leverage the OS's paging mechanism.

If you're just getting started with I/O in Java, I recommend sticking with the classes in java.io unless and until you can explain why switching to java.nio will give you improved performance. It's much easier to use a stream-oriented abstraction than a block-oriented one.

FileInput/OutputStream versus FileChannels -- which gives better performance

Input and Output Streams assume a stream styled access to the file or resource. There are a few extra items which help (array reads) but the basic idea is that of a stream where you read in one or more characters at a time (possibly blocking until you have more characters available).

Channels are the means to copy information into Buffers. This provides a lower level of access to input and output routines. With thoughtful buffer sizing, the speed-ups can be impressive. Structuring your code around buffers can reduce the time spent in a read loop (also increasing performance). Finally, while it is possible to do pre-checking of input stream state in an attempt to avoid blocking, Channels and Buffers allow operations to perform in a non-blocking manner (even in the worst conditions).

Java : copy files efficiently with channel

This question was asked before:

Java NIO FileChannel versus FileOutputstream performance / usefulness

TL.DR.: It matters what your JVM is running on, but mostly the java.nio is slightly faster.

NIO and IO performance issue (first and second read and write) in Centos 7

The first test ran slow because the file had to be loaded from your disk storage the first time.

Loading the file on a 7200rpm drive in 80ms is not necessarily abnormal. Your drive probably has a seek time of about 8ms and we don't know if the file is fragmented or not.

After loading the file is stored in buffer cache and subsequent requests (even different processes) are loaded much faster. The kernel stores files in the buffer cache to speedup access time for commonly used files.

When doing benchmarks, its usually a good idea to perform the tests entirely in memory... or prefetch the file contents so it exists in the buffer cache.

Working with files and file systems: Before NIO, with NIO and with NIO2 in the future

For most operations, NIO2 will let you do more / better.

Some operations are just impossible using legacy APIs (some attributes, ACL, file change notifications, better error handling...).

And best of all: this is not necessarily more difficult.

To answer your question: when you could do some operations with two different APIs, I don't see any use case where the old one would allow to do it better.

There has been some discussion:

Java NIO FileChannel versus FileOutputstream performance / usefulness
http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-faster-than.html

But I'd say newest APIs are designed to be faster. If they don't in some situation, expect a jvm update to restore the situation without having to change any code if you've been using the newer APIs.



Related Topics



Leave a reply



Submit