Why Parallel Execution on Java Compile Take Linear Growth in Time

why parallel execution on java compile take linear growth in time

The java compiler already handles dividing its work across available processors, even when only compiling a single file. Therefore running separate compiler instances in parallel yourself won't yield the performance gains you are expecting.

To demonstrate this, I generated a large (1 million lines, 10,000 methods) java program in a single file called Main1.java. Then made additional copies as Main2.java through Main8.java. Compile times are as follows:

Single file compile:

time javac Main1.java &    --> (real) 11.6 sec

Watching this single file compile in top revealed processor usage mostly in the 200-400% range (indicating multiple CPU usage, 100% per CPU), with occasional spikes in the 700% range (the max on this machine is 800% since there are 8 processors).

Next, two files simultaneously:

time javac Main1.java &    --> (real) 14.5 sec
time javac Main2.java &    --> (real) 14.8 sec

So it only took 14.8 seconds to compile two, when it took 11.6 seconds to compile one. That's definitely non-linear. It was clear by looking at top while these were running that again each java compiler was only taking advantage of at most four CPUs at once (with occasional spikes higher). Because of this, the two compilers ran across eight CPUs mostly in parallel with each other.

Next, four files simultaneously:

time javac Main1.java &    --> (real) 24.2 sec
time javac Main2.java &    --> (real) 24.6 sec
time javac Main3.java &    --> (real) 25.0 sec
time javac Main4.java &    --> (real) 25.0 sec

Okay, here we've hit the wall. We can no longer out-parallelize the compiler. Four files took 25 seconds when two took 14.8. There's a little optimization there but it's mostly a linear time increase.

Finally, eight simultaneously:

time javac Main1.java &    --> (real) 51.9 sec
time javac Main2.java &    --> (real) 52.3 sec
time javac Main3.java &    --> (real) 52.5 sec
time javac Main4.java &    --> (real) 53.0 sec
time javac Main5.java &    --> (real) 53.4 sec
time javac Main6.java &    --> (real) 53.5 sec
time javac Main7.java &    --> (real) 53.6 sec
time javac Main8.java &    --> (real) 54.6 sec

This was actually a little worse than linear, as eight took 54.6 seconds while four only took 25.0.

So I think the takeaway from all this is to have faith that the compiler will do a decent job trying to optimize the work you give it across the available CPU resources, and that trying to add additional parallelization by hand will have limited (if any) benefit.

Edit:

For reference, there are two entries I found in Oracle's bug database regarding enhancing javac to take advantage of multiple processors:

Bug ID:
JDK-6629150 -- The original complaint, this was eventually marked as a duplicate of:
Bug ID: JDK-6713663 -- Suggests the resolution, and based on the "Resolved Date" it appears that multi-processor support in javac was added on 2008-06-12.

Is it possible to compile a large Java module in parallel?

Yes it is possible to build Java code in parallel.

The Java compiler (javac) itself doesn't do this, but both Maven and Ant (and some versions of Make) can run multiple javac instances in parallel.

Furthermore, the Eclipse Java compiler is multi-threaded and you can tell Maven to use it instead of javac; see https://stackoverflow.com/a/3727542/139985

I note that your example involves compiling a single class with a huge number of methods. Parallel compiler instances won't help with that. The Eclipse compiler might help depending on how it is implemented.

However, I put it to you that that is an unrealistic example. People don't write code like that in real life¹, and code generators can (and should) be written to not emit source code like that.

^{1 - Their co-workers will rebel ...}

Why is wall time(elapsed time ) less than cpu time for parallel programs?


wall time:  <-------------->   16
    CPU 1:  ## #############
    CPU 2:      ##########
 CPU time:                     25

How does Go compile so quickly?

Dependency analysis.

The Go FAQ used to contain the following sentence:

Go provides a model for software
construction that makes dependency
analysis easy and avoids much of the
overhead of C-style include files and
libraries.

While the phrase is not in the FAQ anymore, this topic is elaborated upon in the talk Go at Google, which compares the dependency analysis approach of C/C++ and Go.

That is the main reason for fast compilation. And this is by design.

Does dividing an array into halfs improve performance or processing time?

It is unlikely to make the code faster.

If you add up the number of increments, tests and array fetches in your version and compare them with equivalent counts in the simpler version of the code, I think that you will get the same numbers ... or close enough that it makes very little difference.
Your more complicated version of the code is likely to be harder for a JIT compiler to optimize. For example it is less likely for the JIT compiler to decide that it could unroll the loop.
It is possible that the different memory access pattern could lead to more cache misses, more pipeline stalls and therefore reduced performance.

Note that analysis of the above would need careful and detailed examination of the native code generated by the JIT compiler. Too much work.

^{I disagree with @VLAZ's comment that this is O(n/2) vs O(n). Firstly, they are the same thing. Secondly, it depends on what you count as a unit of computation. VLAZ seems to be counting loop iterations, but he is missing the fact that the loop body is doing more than twice the work as in the simple version.}

Next, this smells of premature optimization. Standard advice is to leave the optimization to the compiler. First get the code function complete and working. Then measure the performance. Only hand optimize your code if the measurement shows that you do have a performance problem. Then use a profiler to figure out the hotspots / performance bottlenecks. Then optimize the bottlenecks, and don't bother optimizing code that has minimal effect on performance. (Do the math!)

Finally, the only way to know for sure is to write a micro-benchmark to compare the performance of the two versions. If you are going to do that:

Don't put print statements in the loop. (The time taken to print N numbers is many orders of magnitude larger than the looping.)
Use an existing benchmarking framework like JMH.
Read: How do I write a correct micro-benchmark in Java? ... to avoid the embarrassing mistakes that people typically make when they doing this kind of thing for the first time.

Is Java really slow?

Modern Java is one of the fastest languages, even though it is still a memory hog. Java had a reputation for being slow because it used to take a long time for the VM to start up.

If you still think Java is slow, see the benchmarks game results. Tightly optimized code written in a ahead-of-time compiled language (C, Fortran, etc.) can beat it; however, Java can be more than 10x as fast as PHP, Ruby, Python, etc. There are specific areas where it can beat common compiled languages (if they use standard libraries).

There is no excuse for "slow" Java applications now. Developers and legacy code/libraries are to blame, far more than the language. Also, blame anything 'enterprise.'

In fairness to the "Java is slow" crowd, here are areas where it is still slow (updated for 2013):

Libraries are often written for "correctness" and readability, not performance. In my opinion, this is the main reason Java still has a bad reputation, especially server-side. This makes the String problems exponentially worse. Some simple mistakes are common: objects are often used in place of primitives, reducing performance and increasing memory use. Many Java libraries (including the standard ones) will create Strings frequently, rather than reusing mutable or simpler formats (char[] or StringBuffer). This is slow and creates tons of garbage to collect later. To fix this, I suggest developers use primitive collections and especially Javalution's libraries, where possible.
String operations are a bit slow. Java uses immutable, UTF-16-encoded string objects. This means you need more memory, more memory access, and some operations are more complex than with ASCII (C, C++). At the time, it was the right decision for portability, but it carries a small performance cost. UTF-8 looks like a better choice now.
Array access is a bit slower compared to C, due to bounds checks. The penalty used to be big, but is now small (Java 7 optimizes away a lot of redundant bounds checks).
Lack of arbitrary memory access can make some I/O and bit-level processing slow (compression/decompression for example). This is a safety feature of most high-level languages now.
Java uses a LOT more memory than C, and if your application is memory bound or memory bandwidth bound (caching, etc.) this makes it slower. The flipside is that allocation/deallocation is blazing fast (highly optimized). This is a feature of most high-level languages now, and due to objects and use of GC rather than explicit memory allocation. Plus bad library decisions.
Streams-based I/O is slow due to the (IMO, poor choice) to require synchronization on each stream access. NIO fixed this, but it is a pain to use. One can work around this by doing read/write to an array, instead of an element at a time.
Java doesn't provide the same low-level functionality C does, so you can't use dirty inline assembler tricks to make some operations faster. This provides portability and is a feature of most high-level languages now.
It is common to see Java applications tied to very old JVM versions. Especially server-side. These old JVMs can be incredibly inefficient, compared to the latest versions.

In the end, Java was designed to provide security and portability at the expense of some performance, and for some really demanding operations it shows. Most of its reputation for slowness is no longer deserved.

However, there are several places where Java is faster than most other languages:

Memory allocation and de-allocation
are fast and cheap. I've seen cases
where it is 20% FASTER (or more!) to
allocate a new, multi-kB array than
to reuse a cached one.
Object instantiation and object-oriented features are blazing fast to use (faster than C++ in some cases), because they're designed in from the beginning. This is partially from good GC rather than explicit allocation (which is more friendly to lots of small object allocations). One can code C that beats this (by rolling custom memory management and doing malloc efficiently), but it is not easy.
Method calls are basically free and in some cases faster than large-method code. The HotSpot compiler uses execution information to optimize method calls and has very efficient inlining. By using the additional execution information, it can sometimes outperform ahead-of-time compilers and even (in rare cases) manual inlining. Compare to C/C++ where method calls come with a small performance penalty if compiler decides not to inline.
Synchronization and multi-threading are easy and efficient. Java was designed to be thread-aware from the beginning, and it shows. Modern computers usually feature multiple cores, and because threading is built into the language, you can very easily take advantage. Basically an extra 100% to 300% speed boost vs. standard, single-threaded C code. Yes, carefully written C threading and libraries can beat this, but that's a lot of extra work for the programmer.
Strings include length: some operations are faster. This beats using null-delimited strings (common in C). In Java 7, Oracle took out the String.subString() optimization, because people were using it stupidly and getting memory leaks.
Array copy is highly optimized. In the lastest versions, Java uses hand-tuned assembler for System.arraycopy. The result is that in arraycopy/memcopy-heavy operations, I've seen my code beat the equivalent in C by reasonable margins.
The JIT compiler is smart about using L1/L2 cache. Ahead-of-time compiled programs can't tweak their code in real-time to the specific CPU & system they're running on. JIT provides some very efficient loop transformations this way.

A couple of other historical facts contributed to the "Java is slow" reputation:

Before JIT compilation (Java 1.2/1.3), the language was only interpreted, not compiled, and thus very slow.
JIT compilation took time to become efficient (major improvements with each version)
Classloading has become a lot more efficient over the years. It used to be quite inefficient and slow during startup.
Swing and UI code did not use native graphics hardware very well.
Swing is just awful. I blame AWT and Swing for why Java never caught on for the desktop.
Heavy use of synchronization in library classes; unsynchronized versions are now available
Applets take forever to load, because of transmitting a full JAR over the network and loading the VM to boot.
Synchronization used to carry a heavy performance penalty (this has been optimized with each Java version). Reflection is still costly, though.

Why Parallel Execution on Java Compile Take Linear Growth in Time