Force Jvm to Do All Io Without Page Cache (E.G. O_Direct)

Force JVM to do all IO without page cache (e.g. O_DIRECT)


  "The thing that has always disturbed me about O_DIRECT is that the
whole interface is just stupid, and was probably designed by a deranged
monkey on some serious mind-controlling substances [*]."

[*] In other words, it's an Oracleism.

-- Linus Torvalds from Transmeta, 11 May 2002

Check NOTES section of the man 2 open:

O_DIRECT

The O_DIRECT flag may impose alignment restrictions on the length and address
of userspace buffers and the file offset of I/Os. In Linux alignment
restrictions vary by file system and kernel version ....

Under Linux 2.4, transfer sizes, and the alignment of the user buffer and the
file offset must all be multiples of the logical block size of the file
system. Under Linux 2.6, alignment to 512-byte boundaries suffices.
....

In summary, O_DIRECT is a potentially powerful tool that should be used with
caution. It is recommended that applications treat use of O_DIRECT as a
performance option which is disabled by default.

I think, there are some usages of FileInputStream in the JRE (classloader) which has reads with offsets or sizes not aligned to 512 bytes. (For Advanced Format the minimal alignment may be bigger, even 4096 bytes, or one 4K page.)

The behaviour of kernel for unaligned offsets is the grey zone, some info is here: RFC: Clarifying Direct I/O Semantics, Theodore Ts'o, tytso@mit, LWN, 2009

Other interesting discussion is here: Linux: Accessing Files With O_DIRECT (kerneltrap, 2007)

Hmm, seems like there should be fallback to buffered I/O when something with DIRECT fails. All IO operations with DIRECT are synchronous. May be some DMA effects? Or combination of O_DIRECT and mmap?

UPDATE:

Thanks for strace output. Here is the error (grep O_DIRECT, then check file descriptor operations):

28290 open("...pact/perf/TestDirectIO.class", O_RDONLY|O_DIRECT) = 11
28290 fstat(11, {st_mode=S_IFREG|0644, st_size=2340, ...}) = 0
28290 fcntl(11, F_GETFD) = 0
28290 fcntl(11, F_SETFD, FD_CLOEXEC) = 0
...skip
28290 stat("...pact/perf/TestDirectIO.class", {st_mode=S_IFREG|0644, st_size=2340, ...}) = 0
...skip
28290 read(11, "\312\376\272\276\0\0\0003\0\215\n\0-\0D\t\0E\0F\7\0G\n\0\3\0D\10\0H\n"..., 1024) = 1024
28290 read(11, 0x7f1d76d23a00, 1316) = -1 EINVAL (Invalid argument)

Unaligned read size results in EINVAL error. Your classfile is 2340 bytes long, it is 1024+1316 bytes, which is not aligned.

Java open file with option similar to the Windows c++ FILE_FLAG_WRITE_THROUGH

There is StandardOpenOption.SYNC and StandardOpenOption.DSYNC:

From the Synchronized File I/O Integrity Documentation:

The SYNC and DSYNC options are used when opening a file to require that updates to the file are written synchronously to the underlying storage device. In the case of the default provider, and the file resides on a local storage device, and the seekable channel is connected to a file that was opened with one of these options, then an invocation of the write method is only guaranteed to return when all changes made to the file by that invocation have been written to the device. These options are useful for ensuring that critical information is not lost in the event of a system crash. If the file does not reside on a local device then no such guarantee is made. Whether this guarantee is possible with other provider implementations is provider specific.

Javadoc for SYNC and DSYNC options

In Linux/MacOS systems, this translates to the SYNC/DSYNC options for the open function for opening files.

In Windows, either of those options being set translates to using the FILE_FLAG_WRITE_THROUGH option, which can be seen in the source in WindowsChannelFactory:

if (flags.dsync || flags.sync)
dwFlagsAndAttributes |= FILE_FLAG_WRITE_THROUGH;

To use these flags, if you are unfamiliar with the nio File API in Java it goes something like this:

Path file = Paths.get("myfile.dat");
SeekableByteChannel c = Files.newByteChannel(file, StandardOpenOption.SYNC);

you can use the channel directly to read/write data using byte buffers, or convert to a familiar input stream or output stream using the Channels class:

InputStream is = Channels.newInputStream(c);    

Why java FileOutputStream's write() or flush() doesn't make NFS client really send data to NFS server?

You are probably running into Unix client-side caching. There are lots of details here in the O'Reilly NFS book.

But in short:

Using the buffer cache and allowing async threads to cluster multiple buffers introduces some problems when several machines are reading from and writing to the same file. To prevent file inconsistency with multiple readers and writers of the same file, NFS institutes a flush-on-close policy:
All partially filled NFS data buffers for a file are written to the NFS server when the file is closed.

For NFS Version 3 clients, any writes that were done with the stable flag set to off are forced onto the server's stable storage via the commit operation.

NFS cache consistency uses an approach called close-to-open cache consistency - that is, you have to close the file before your server (and other clients) get a consistent up-to-date view of the file. You are seeing the downsides of this approach, which aims to minimize server hits.

Avoiding the cache is hard from Java. You'd need to set the file open() O_DIRECT flag if you're using Linux; see this answer for more https://stackoverflow.com/a/16259319/5851520, but basically it disables the client's OS cache for that file, though not the server's.

Unfortunately, the standard JDK doesn't expose O_DIRECT. as discussed here: Force JVM to do all IO without page cache (e.g. O_DIRECT) - essentially, use JNI youself or use a nice 3rd party lib. I've heard good things about JNA: https://github.com/java-native-access/jna ...

Alternatively, if you have control over the client mount point, you can use the sync mount option, as per NFS manual. It says:

If the sync option is specified on a mount point, any system call
that writes data to files on that mount point causes that data to be
flushed to the server before the system call returns control to user
space. This provides greater data cache coherence among clients, but
at a significant performance cost.

This could be what you're looking for.

Why should I use a human readable file format?


It depends

The right answer is it depends. If you are writing audio/video data for instance, if you crowbar it into a human readable format, it won't be very readable! And word documents are the classic example where people have wished they were human readable, so more flexible, and by moving to XML MS are going that way.

Much more important than binary or text is a standard or not a standard. If you use a standard format, then chances are you and the next guy won't have to write a parser, and that's a win for everyone.

Following this are some opinionated reasons why you might want to choose one over the other, if you have to write your own format (and parser).

Why use human readable?

  1. The next guy. Consider the maintaining developer looking at your code 30 years or six months from now. Yes, he should have the source code. Yes he should have the documents and the comments. But he quite likely won't. And having been that guy, and had to rescue or convert old, extremely, valuable data, I'll thank you for for making it something I can just look at and understand.
  2. Let me read AND WRITE it with my own tools. If I'm an emacs user I can use that. Or Vim, or notepad or ... Even if you've created great tools or libraries, they might not run on my platform, or even run at all any more. Also, I can then create new data with my tools.
  3. The tax isn't that big - storage is free. Nearly always disc space is free. And if it isn't you'll know. Don't worry about a few angle brackets or commas, usually it won't make that much difference. Premature optimisation is the root of all evil. And if you are really worried just use a standard compression tool, and then you have a small human readable format - anyone can run unzip.
  4. The tax isn't that big - computers are quick. It might be a faster to parse binary. Until you need to add an extra column, or data type, or support both legacy and new files. (though this is mitigated with Protocol Buffers)
  5. There are a lot of good formats out there. Even if you don't like XML. Try CSV. Or JSON. Or .properties. Or even XML. Lots of tools exist for parsing these already in lots of languages. And it only takes 5mins to write them again if mysteriously all the source code gets lost.
  6. Diffs become easy. When you check in to version control it is much easier to see what has changed. And view it on the Web. Or your iPhone. Binary, you know something has changed, but you rely on the comments to tell you what.
  7. Merges become easy. You still get questions on the web asking how to append one PDF to another. This doesn't happen with Text.
  8. Easier to repair if corrupted. Try and repair a corrupt text document vs. a corrupt zip archive. Enough said.
  9. Every language (and platform) can read or write it. Of course, binary is the native language for computers, so every language will support binary too. But a lot of the classic little tool scripting languages work a lot better with text data. I can't think of a language that works well with binary and not with text (assembler maybe) but not the other way round. And that means your programs can interact with other programs you haven't even thought of, or that were written 30 years before yours. There are reasons Unix was successful.

Why not, and use binary instead?

  1. You might have a lot of data - terabytes maybe. And then a factor of 2 could really matter. But premature optimization is still the root of all evil. How about use a human one now, and convert later? It won't take much time.
  2. Storage might be free but bandwidth isn't (Jon Skeet in comments). If you are throwing files around the network then size can really make a difference. Even bandwidth to and from disc can be a limiting factor.
  3. Really performance intensive code. Binary can be seriously optimised. There is a reason databases don't normally have their own plain text format.
  4. A binary format might be the standard. So use PNG, MP3 or MPEG. It makes the next guys job easier (for at least the next 10 years).
  5. There are lots of good binary formats out there. Some are global standards for that type of data. Or might be a standard for hardware devices. Some are standard serialization frameworks. A great example is Google Protocol Buffers. Another example: Bencode
  6. Easier to embed binary. Some data already is binary and you need to embed it. This works naturally in binary file formats, but looks ugly and is very inefficient in human readable ones, and usually stops them being human readable.
  7. Deliberate obscurity. Sometimes you don't want it obvious what your data is doing. Encryption is better than accidental security through obscurity, but if you are encrypting you might as well make it binary and be done with it.

Debatable

  1. Easier to parse. People have claimed that both text and binary are easier to parse. Now clearly the easiest to parse is when your language or library supports parsing, and this is true for some binary and some human readable formats, so doesn't really support either. Binary formats can clearly be chosen so they are easy to parse, but so can human readable (think CSV or fixed width) so I think this point is moot. Some binary formats can just be dumped into memory and used as is, so this could be said to be the easiest to parse, especially if numbers (not just strings are involved. However I think most people would argue human readable parsing is easier to debug, as it is easier to see what is going on in the debugger (slightly).
  2. Easier to control. Yes, it is more likely someone will mangle text data in their editor, or will moan when one Unicode format works and another doesn't. With binary data that is less likely. However, people and hardware can still mangle binary data. And you can (and should) specify a text encoding for human-readable data, either flexible or fixed.

At the end of the day, I don't think either can really claim an advantage here.

Anything else

Are you sure you really want a file? Have you considered a database? :-)

Credits

A lot of this answer is merging together stuff other people wrote in other answers (you can see them there). And especially big thanks to Jon Skeet for his comments (both here and offline) for suggesting ways it could be improved.



Related Topics



Leave a reply



Submit