Yo! I am preparing a new talk, to be first presented at the upcoming Joker Conference in Saint Petersburg, Russia. With it, I intend to make everyone capable of digging deep into HotSpot to get to the bottom of their issues.
This blog entry is a teaser which examines a sample problem encountered by Nikita of Plumbr.
What does GC cause _no_gc mean? and _last_ditch_collection too. #JVM /cc @shipilev ?
Last Ditch Collection
One of the most awesome things about HotSpot is that it is almost completely built from OpenJDK. So, we have virtually no restrictions in our investigation. Let’s fetch the latest version and do a grep:
$ grep -rn "_last_ditch_collection" . ... ./src/share/vm/gc_implementation/shared/vmGCOperations.cpp:286: heap->collect_as_vm_thread(GCCause::_last_ditch_collection); ...
There we are! Reading the enclosing method gives us an understanding of what’s happening:
// Don't clear the soft refs yet. heap->collect_as_vm_thread(GCCause::_metadata_GC_threshold); // After a GC try to allocate without expanding. Could fail // and expansion will be tried below.
// If still failing, allow the Metaspace to expand. // This should work unless there really is no more space // or a MaxMetaspaceSize has been specified on the command line. _result = _loader_data->metaspace_non_null()->expand_and_allocate(_size, _mdtype);
// If expansion failed, do a last-ditch collection and try allocating // again. A last-ditch collection will clear softrefs. heap->collect_as_vm_thread(GCCause::_last_ditch_collection);
When HotSpot finds itself unable to allocate memory for something that should be stored in metaspace (e.g., a class), it first runs a GC (GCCause::_metadata_GC_threshold
). If that doesn’t help, an attempt to expand metaspace is made. If that still doesn’t help, a full GC is triggered with reason GCCause::_last_ditch_collection
.
Soft references are cleaned up at this point. If there is still insufficient room, then an OOM is thrown.
Sounds good, but let us write some actual code that gets this behavior consistently. Leaking a lot of stuff to metaspace should, by this logic, result in at least one last ditch collection:
ClassPool pool = ClassPool.getDefault(); for(long l = 0; l < Long.MAX_VALUE; l++) { pool.makeClass("com.example.Kitty" + l).toClass(); }
Running this with default params will take quite a lot of time before shit hits the fan, but we can reduce the metaspace size to speed up the process:
$ java -cp build/libs/labs-8-all.jar -XX:+PrintGCDetails -XX:MaxMetaspaceSize=10m ru.gvsmirnov.perv.labs.gc.NoGcTrigger ... [Full GC (Last ditch collection) [...] 14470K->14425K(32768K), [Metaspace: 8971K->8971K(1056768K)], 0.0481030 secs] ... Exception in thread "main" javassist.CannotCompileException: by java.lang.OutOfMemoryError: Metaspace at javassist.ClassPool.toClass(ClassPool.java:1099) ...
Note that usually, more than one last ditch collection is observed. It is expected, since… Well, otherwise there wouldn’t be much point doing it in the first place.
Also, we were using a Java 9 bundle, but the logic of last ditch collection has not changed for quite a long while, so we’re safe in that regard.
Another fun thing to note is that we did not really need to leak to metaspace all by ourselves.
JVM does a good job of loading a lot of classes on its own, so if we just run
java -XX:+PrintGCDetails -XX:MaxMetaspaceSize=2m X
, we will get some last ditch collections
and an OOM before JVM even notices that class X
does not exist.
No GC
Now, “No GC” is a bit more tricky. Obviously, the HotSpot developers would not in their
sane minds intentionally make this a valid GC cause. Actually, the only two reasonable places
this could appear is if we run
jstat
before any GC has occurred:
$ jstat -gccause -t `jps | grep NoGc | cut -d' ' -f1` 100 Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC 0.7 0.00 0.00 4.00 0.00 40.34 0 0.000 0 0.000 0.000 No GC No GC 0.8 0.00 0.00 6.00 0.00 45.82 0 0.000 0 0.000 0.000 No GC No GC 0.9 0.00 0.00 6.00 0.00 51.44 0 0.000 0 0.000 0.000 No GC No GC
Or when it samples at a time when there is no garbage collection activity:
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC 2.8 0.00 0.00 0.00 0.78 31.00 5 0.080 5 1.051 1.131 Allocation Failure No GC
So, if Nikita saw this in a different situation, this is very likely a bug. Seeing as we did not observe this issue while investigating the last ditch collection, it is possible that the bug has been fixed already. Let’s be optimistic and look for all the relevant changesets:
$ hg grep --all '::_no_gc' ... src/share/vm/gc_implementation/shared/vmGCOperations.cpp:2097:+: assert(((_gc_cause != GCCause::_no_gc) && ...
Apparently, the bug was fixed in changeset 2097 on
February the 3rd in 2011. According to make/hotspot_version
, the latest HotSpot build
that is affected is 21.0-b01.
I had a suitable Java 6 installation at hand:
$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
Which immediately produced the following result:
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC 2.7 0.00 0.00 0.00 0.78 100.00 5 0.080 5 1.051 1.131 unknown GCCause Allocation Failure 2.8 0.00 0.00 0.00 0.78 100.00 5 0.080 5 1.051 1.131 unknown GCCause Allocation Failure
This is close, but not quite what we were looking for. However, upon a closer examination of the changeset,
it will become apparent that setting -XX:+UseG1GC
should give us the result we need:
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC 1.6 ? ? ? ? 96.78 ? ? ? ? ? G1 Evacuation Pause No GC 1.7 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 1.8 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 1.9 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 2.0 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full
Let the question marks and what gave G1 away remain an exercise for the reader ;)
Summary
We have just seen that, given just a short tweet to go on and using just the basic command line tools, we can identify a HotSpot bug hidden in the depths of time, put and upper bound on the JVM version that the issue has arisen at, and even learn what GC algorithm was in use at the time.
The bottom line is that confusion exists in our minds, not in reality.
Call For Action
This is all nice and cool, but I need more real life examples of puzzling HotSpot behavior. You are welcome to send them over to me by email, twitter or in the comments below. It is quite probable that I will investigate your issue, free of charge!
P.S. Thanks to Nikita for giving this sample and proof-reading the post! (Any mistakes are my own, though)