Can the Jvm Recover from an Outofmemoryerror Without a Restart

Can the JVM recover from an OutOfMemoryError without a restart

It may work, but it is generally a bad idea. There is no guarantee that your application will succeed in recovering, or that it will know if it has not succeeded. For example:

  • There really may be not enough memory to do the requested tasks, even after taking recovery steps like releasing block of reserved memory. In this situation, your application may get stuck in a loop where it repeatedly appears to recover and then runs out of memory again.

  • The OOME may be thrown on any thread. If an application thread or library is not designed to cope with it, this might leave some long-lived data structure in an incomplete or inconsistent state.

  • If threads die as a result of the OOME, the application may need to restart them as part of the OOME recovery. At the very least, this makes the application more complicated.

  • Suppose that a thread synchronizes with other threads using notify/wait or some higher level mechanism. If that thread dies from an OOME, other threads may be left waiting for notifies (etc) that never come ... for example. Designing for this could make the application significantly more complicated.

In summary, designing, implementing and testing an application to recover from OOMEs can be difficult, especially if the application (or the framework in which it runs, or any of the libraries it uses) is multi-threaded. It is a better idea to treat OOME as a fatal error.

See also my answer to a related question:

EDIT - in response to this followup question:

In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it?

No you don't have to restart. But it is probably wise to, especially if you don't have a good / automated way of checking that the service is running correctly.

The JVM will recover just fine. But the application server and the application itself may or may not recover, depending on how well they are designed to cope with this situation. (My experience is that some app servers are not designed to cope with this, and that designing and implementing a complicated application to recover from OOMEs is hard, and testing it properly is even harder.)

EDIT 2

In response to this comment:

"other threads may be left waiting for notifies (etc) that never come" Really? Wouldn't the killed thread unwind its stacks, releasing resources as it goes, including held locks?

Yes really! Consider this:

Thread #1 runs this:

    synchronized(lock) {
while (!someCondition) {
lock.wait();
}
}
// ...

Thread #2 runs this:

    synchronized(lock) {
// do something
lock.notify();
}

If Thread #1 is waiting on the notify, and Thread #2 gets an OOME in the // do something section, then Thread #2 won't make the notify() call, and Thread #1 may get stuck forever waiting for a notification that won't ever occur. Sure, Thread #2 is guaranteed to release the mutex on the lock object ... but that is not sufficient!

If not the code ran by the thread is not exception safe, which is a more general problem.

"Exception safe" is not a term I've heard of (though I know what you mean). Java programs are not normally designed to be resilient to unexpected exceptions. Indeed, in a scenario like the above, it is likely to be somewhere between hard and impossible to make the application exception safe.

You'd need some mechanism whereby the failure of Thread #1 (due to the OOME) gets turned into an inter-thread communication failure notification to Thread #2. Erlang does this ... but not Java. The reason they can do this in Erlang is that Erlang processes communicate using strict CSP-like primitives; i.e. there is no sharing of data structures!

(Note that you could get the above problem for just about any unexpected exception ... not just Error exceptions. There are certain kinds of Java code where attempting to recover from an unexpected exception is likely to end badly.)

Does JVM terminate itself after OutOfMemoryError

An OutOfMemoryError DOES NOT TERMINATE JVM.

If it is uncaught, it terminates the THREAD from which the error was initiated. Other threads keep running just fine, unless of course they cause OutOfMemoryErrors too.

Only after all threads have been terminated or all remaining threads are daemon threads, the JVM is terminated.

It does not terminate the JVM because it does not have to. Terminating the JVM is a very extreme operation and it is not performed lightly.

It will not try to get any resources back, because there is nothing to be retrieved. The reason OOME is thrown is just that: JVM can not acquire a resource because all resources are taken. It has already done everything else it can.

One must remember that OOME is not necessarily thrown in the thread that consumes the most memory. A thread can consume all memory and yield processing to another thread that tries to allocate "just one byte". Thas of course fails and the thread that tried to allocate the one byte gets interrupted by an OOME. That is the reason why recovering from an OOME is nearly impossible.

can java server survive after OutOfMemoryError

Yes -- it is possible to survive an OOM exception, but you likely lost a thread in the process. Once that thread died, all memory that it was holding on to got GC'd and you continued to run.

I'd never advise relying on this behavior though because you can't guarantee which thread is going to die.

How to avoid experiments stopping after an OutOfMemoryError?

Is this a good idea?

It is a bad idea.

In my answer to a different question, I explain some of the reasons why catching OutOfMemoryError may not allow the application to recover properly. (It depends on the nature of application and the real reason it ran out of memory.)



I'm thinking about catching the error, and forcing the garbage collector

There is no point "forcing" a garbage collection in your scenario. If one of your experiments fails with an OOME, you can be assured the GC has just run ... and been unable to find enough free memory to continue. Now, between the OOME being thrown and you catching it, you would how that some of the objects that were reachable via the experiments stack frames are now unreachable. The JVM will deal with that ... by running the GC itself.


I think that a better way to solve your problem is to make your application restartable. Have it keep a record (in a file!) of the experiments that complete and those that fail. When an OOME occurs, record this in the file. Then you add a "restart with the next experiment" feature to your application, and write a light-weight wrapper script to run the Java application repeatedly until it completes.

By restarting in a new JVM, you avoid having to deal with the damage that OOMEs can cause; e.g. when you have multiple threads. And you also have a "bandaid" for OOMEs that are caused by memory leaks. Finally, you may well find that the experiments run faster in a clean / empty heap.

How to deal with java.lang.OutOfMemoryError: Java heap space error?

Ultimately you always have a finite max of heap to use no matter what platform you are running on. In Windows 32 bit this is around 2GB (not specifically heap but total amount of memory per process). It just happens that Java chooses to make the default smaller (presumably so that the programmer can't create programs that have runaway memory allocation without running into this problem and having to examine exactly what they are doing).

So this given there are several approaches you could take to either determine what amount of memory you need or to reduce the amount of memory you are using. One common mistake with garbage collected languages such as Java or C# is to keep around references to objects that you no longer are using, or allocating many objects when you could reuse them instead. As long as objects have a reference to them they will continue to use heap space as the garbage collector will not delete them.

In this case you can use a Java memory profiler to determine what methods in your program are allocating large number of objects and then determine if there is a way to make sure they are no longer referenced, or to not allocate them in the first place. One option which I have used in the past is "JMP" http://www.khelekore.org/jmp/.

If you determine that you are allocating these objects for a reason and you need to keep around references (depending on what you are doing this might be the case), you will just need to increase the max heap size when you start the program. However, once you do the memory profiling and understand how your objects are getting allocated you should have a better idea about how much memory you need.

In general if you can't guarantee that your program will run in some finite amount of memory (perhaps depending on input size) you will always run into this problem. Only after exhausting all of this will you need to look into caching objects out to disk etc. At this point you should have a very good reason to say "I need Xgb of memory" for something and you can't work around it by improving your algorithms or memory allocation patterns. Generally this will only usually be the case for algorithms operating on large datasets (like a database or some scientific analysis program) and then techniques like caching and memory mapped IO become useful.



Related Topics



Leave a reply



Submit