Java I/O VS. Java New I/O (Nio) with Linux Nptl

Java I/O vs. Java new I/O (NIO) with Linux NPTL

Provocative blog posting, "Avoid NIO, get better throughput." Paul Tyma's(2008) blog claims ~5000 threads without any trouble; I've heard folks claim more:

  1. With NPTL on, Sun and Blackwidow JVM 1.4.2 scaled easily to 5000+
    threads. Blocking model was
    consistently 25-35% faster than using
    NIO selectors. Lot of techniques
    suggested by EmberIO folks were
    employed - using multiple selectors,
    doing multiple (2) reads if the first
    read returned EAGAIN equivalent in
    Java. Yet we couldn't beat the plain
    thread per connection model with Linux
    NPTL.

I think the key here is to measure the overhead and performance, and make the move to non-blocking I/O only when you know you need to and can demonstrate an improvement. The additional effort to write and maintain non-blocking code should be factored in to your decision. My take is, if your application can be cleanly expressed using synchronous/blocking I/O, DO THAT. If your application is amenable to non-blocking I/O and you won't just be re-inventing blocking I/O badly in application-space, CONSIDER moving to nio based on measured performance needs. I'm amazed when I poke around the google results for this how few of the resources actually cite any (recent) numbers!

Also, see Paul Tyma's presentation slides: The old way is new again. Based on his work at Google, concrete numbers suggest that synchronous threaded I/O is quite scalable on Linux, and consider "NIO is faster" a myth that was true for awhile, but no longer. Some good additional commentary here on Comet Daily. He cites the following (anecdotal, still no solid link to benchmarks, etc...) result on NPTL:

In tests, NPTL succeeded in starting
100,000 threads on a IA-32 in two
seconds. In comparison, this test
under a kernel without NPTL would have
taken around 15 minutes

If you really are running into scalability problems, you may want to tune the thread stack size using XX:ThreadStackSize. Since you mention Tomcat see here.

Finally, if you're bound and determined to use non-blocking I/O, make every effort to build on an existing framework by people who know what they're doing. I've wasted far too much of my own time trying to get an intricate non-blocking I/O solution right (for the wrong reasons).

See also related on SO.

java.net versus java.nio

Scalability will probably drive your choice of package. java.net will require one thread per socket. Coding it will be significantly easier. java.nio is much more efficient, but can be hairy to code around.

I would ask yourself how many connections you expect to be handling. If it's relatively few (say, < 100), I'd go with java.net.

Has Java blocking IO on 64 bit Linux, in 2015, solved the C10K issue?

I don't see any limitations inherent to Java here. Can you start 10,000 Java threads? Yes, easily. Can you open 10,000 java.io sockets? Yes, you can. Can your Linux setup handle it? The only way is to try and find out. Speaking from experience I saw JBoss servers do it on CentOS with >10k java.io connections.

How to join a thread that is hanging on blocking IO?

Old question which could very well get a new answer as things have evolved and a new technology is now available to better handle signals in threads.

Since Linux kernel 2.6.22, the system offers a new function called signalfd() which can be used to open a file descriptor for a given set of Unix signals (outside of those that outright kill a process.)

// defined a set of signals
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGUSR1);
// ... you can add more than one ...

// prevent the default signal behavior (very important)
sigprocmask(SIG_BLOCK, &set, nullptr);

// open a file descriptor using that set of Unix signals
f_socket = signalfd(-1, &set, SFD_NONBLOCK | SFD_CLOEXEC);

Now you can use the poll() or select() functions to listen to the signal along the more usual file descriptor (socket, file on disk, etc.) you were listening on.

The NONBLOCK is important if you want a loop that can check signals and other file descriptors over and over again (i.e. it is also important on your other file descriptor).

I have such an implementation that works with (1) timers, (2) sockets, (3) pipes, (4) Unix signals, (5) regular files. Actually, really any file descriptor plus timers.

https://github.com/m2osw/snapcpp/blob/master/snapwebsites/libsnapwebsites/src/snapwebsites/snap_communicator.cpp

https://github.com/m2osw/snapcpp/blob/master/snapwebsites/libsnapwebsites/src/snapwebsites/snap_communicator.h

You may also be interested by libraries such as libevent



Related Topics



Leave a reply



Submit