Why Have One Jvm Per Application

Why have one JVM per application?

(I assume you are talking about applications launched via a public static void main(String[]) method ...)

In theory you can run multiple applications in a JVM. In practice, they can interfere with each other in various ways. For example:

The JVM has one set of System.in/out/err, one default encoding, one default locale, one set of system properties, and so on. If one application changes these, it affects all applications.
Any application that calls System.exit() will effectively kill all applications.
If one application goes wild, and consumes too much CPU or memory it will affect the other applications too.

In short, there are lots of problems. People have tried hard to make this work, but they have never really succeeded. One example is the Echidna library, though that project has been quiet for ~10 years. JNode is another example, though they (actually we) "cheated" by hacking core Java classes (like java.lang.System) so that each application got what appeared to be independent versions of System.in/out/err, the System properties and so on¹.

^{1 - This ("proclets") was supposed to be an interim hack, pending a proper solution using true "isolates". But isolates support stalled, primarily because the JNode architecture used a single address space with no obvious way to separate "system" and "user" stuff. So while we could create APIs that matched the isolate APIs, key isolate functionality (like cleanly killing an isolate) was virtually impossible to implement. Or at least, that was/is my view.}

Why multiple jvm instances will be created?

Can anyone please tell me why multiple instances will be created if we run multiple java programs in a single machine?

When you run any program more than once it creates multiple instances of that program. This is not unique to Java.

Why can't all programs share a single jvm

Your programs should share a single JVM, however this means you should only start one JVM and tell it which application to run. Whether you want to do this or not is another question. SHaring application add complexity and in fact the trend is to turn single monoliths into multiple microservices

Some reasons to split rather than combine JVMs.

the pause time of standard garbage collectors increases with the size of the heap used. Splitting the applications will mean more, smaller pauses.
tuning the garbage collection for a simple work load is easier. If you have mixed work loads one might want a large young space and the other a small young space. The more mixed the work load, the harder the JVM is to tune.
if you have a resource leak in your application, the only way to fix this might be to restart the JVM. If each application is in it's own JVM, you need not stop other applications.
you might have code you can bring down the JVM (fill up the memory, crash the JVM, or leave it in unusable state) Multiple JVMs limit the impact of such a failure.
forces you to a clear separation between key components which improves maintainability. This can be done in a monolith but needs disciple.

and will java libraries load separately to all jvm instances?

The JVM memory maps the JARs into memory and it is the OS, not the JVM, which then decides whether or how JARs are shared between processes.

However, each JVM which needs a class loads it and sets it's copy of the static fields for example.

i mean to say how does jvm loads common libraries in memory because if two jvm loads same classes in memory then there might be conflict while accessing that class object

Each process has it's own memory space. If two JVM load the same class, these two copies have no interaction with one another and there is no opportunity for conflict.

Why Only one SparkContext is allowed per JVM?

The answer is simple - it has not been designed to work with multiple contexts. Quoting Reynold Xin:

I don't think we currently support multiple SparkContext objects in the same JVM process. There are numerous assumptions in the code base that uses a a shared cache or thread local variables or some global identifiers which prevent us from using multiple SparkContext's.

In a broader sense - single application (with main), single JVM - is standard approach in Java world (Is there one JVM per Java application?, Why have one JVM per application?). Application servers choose different approach, but it is exception, not a rule.

From practical point of view - handling single data intensive application is painful enough (tuning GC, dealing with leaking resources, communication overhead). Mutliple Spark applications running in a single JVM would be impossible to tune and manage in a long run.

Finally there would not be much use of having multiple contexts, as each distributed data structure is tightly connected to its context.

Multiple JVMs vs single app server

Checkout 'multi-tenant' JVM's.

IBM's JRE has it already: http://www.ibm.com/developerworks/library/j-multitenant-java/

Waratek has implemented it on top of the Oracle JRE, and they created ElastiCat, a Tomcat fork that isolates different applications in the same container: http://www.elasticat.com/faq/

Multi-tenancy is rumoured to appear in the official Oracle Java 9 JVM, too.

=======================================================

Update: Java 9 is out, but no word from Oracle about multi-tenancy.
It seems they prefer having multiple JVM's these days, even multiple Containers (e.g. docker).

How is JVM instance created per application?

When you start a program like java, the operating system creates a "process". A process is the representation of a live, running program. The process concept is what allows you to run several copies of a program at the same time. Each process has its own private memory space and system resources like open files or network connections. Each process can load a different set of dynamically linked libraries. With Java, much of the jvm is implemented in shared libraries, which the launcher program "java" loads in at run time.

The details are OS dependent and become complicated fast.

One of the things that happen when the process is started is that the executable file is mapped into memory. The CPU cannot execute instructions that are on disk or other external storage, so the program "text" has to be copied from disk into main memory first. Mapping the file into memory simplifies this and makes it more efficient: If the CPU needs to access a memory location that's not actually in RAM, the memory manager unit (MMU) issues a "page fault". The page fault causes data to be loaded into RAM. This is more efficient than simply copying the program text into RAM (what if not all text is needed all the time) and also simplifies the overall system (the virtual memory system is already needed for other OS features)

When multiple java programs run on the same machine

1) If I have a web service written in java it will need a JVM instance to run. So can JVM be made a daemon process?

Yes it can. How it is done depends on the O/S and on the web server container itself.

2) If yes when we run any other java application it will use this instance of JVM or create a new one?

No. Each Java application uses an independent JVM.

Each JVM is a separate process, and that means there is no sharing of stacks, heaps, etcetera. (Generally, the only things that might be shared are the read only segments that hold the code of the core JVM and native libraries ... in the same way that normal processes might share code segments.)

3) Main memory available in any machine is constant. When we start n java processes simultaneously without providing any initial heap size how is the heap size distributed among processes?

The mechanism for deciding how big to make the heap if you don't specify a size depends on the JVM / platform / version you are using, and whether you using the "client" or "server" model (for Hotspot JVMs). The heuristic doesn't take account of the number or size of other JVMs.

Reference: https://stackoverflow.com/a/4667635/139985

In practice, you would probably be better off specifying the heap size directly.

4) Is there any process that manages n number of JVM instances or is it managed by OS itself?

Neither. The number of JVM instances is determined by the actions of various things that can start processes; e.g. daemons scripts, command scripts, users typing commands at the command line, etcetera. Ultimately, the OS may refuse to start any more processes if it runs out of resources, but JVMs are not treated any different to other processes.

5) When stop-the-world happens during an GC are other JVM instances(different threads I assume) affected?

No. The JVMs are independent processes. They don't share any mutable state. Garbage collection operates on each JVM independently.

Single / multiple JVM on Single Machine with single / multiple core

You have a lot of different variables in play there and there isn't going to be a simple answer. I'll add some details below, but the tl;dr version is going to be "run some tests for your scenario and see what works the best".

Whether or not you should have one thread per cpu depends on your workload. As you've already apparently found, if your thread is cpu intensive (meaning, most of the time it is actively using the cpu), then one thread will come close to fully utilizing one cpu. In many real-world situations, though, it's likely that your thread may have to do other things (like wait for I/O), in which case it will not fully utilize a cpu (meaning you could run more than one thread per cpu).

also, in your two scenarios, you need to account for all the jvm overhead. Each jvm will have many of its own threads (most notably, the GC threads). These will add additional overhead to your cpu usage. If your code makes heavy use of the garbage collector (creating/discarding a large amount of garbage while working), it may be beneficial to have separate jvms per thread, but you'll need to account for the additional gc thread cpu usage. if your code does not make a lot of garbage, then using separate jvms (and many extra gc threads) may just be wasting resources.

Given all these variables, as well as the actual workload profile of your threads, you are unlikely to find the right answer with theory alone. Testing a variety of scenarios with real world data is the only way you will find the right mix for your application (it might end up being something in the middle like 4 jvms with 5 threads each).

Why Have One Jvm Per Application