Benchmarks In Interpreted Mode

tech

java benchmarking jvm internals interpreter jit compiler optimizations jmh

Work in progress. Do tell me if you notice anything amiss, but do not be surprised.

A while ago, there was a discussion on stackoverflow where some guy named Evil Menace jokingly offered to disable JIT compilation in order to get more consistent and predictable performance when running benchmarks.

Apparently, it might be not clear to everyone how exactly this is a bad idea. Seeing as the stackoverflow guy has failed to come up with an example of that, I have decided to do so on my own, even if it has been his plan all along.

Problem Statement

The original argument went thus:

I am not interested in how fast my code actually is. It doesn’t matter for me if it is 10 ms, 20 ms or 100 ms fast. I just want to compare two different algorithms and know which one is faster compared to the other.

My point is that such a comparison may well turn out to be meaningless, because the compiler might easily turn the tables in such a comparison. So, we need to come up with two algorithms A and B such that when benchmarked with -Xint, A is faster than B, but when benchmarked normally, the performance evens out, or even turns the other way around.

To achieve this, we need the JIT to either greatly optimize B, but not A, or just severely hurt the performance of A, but not B. The latter is not something that should happen (although it might), so we will go with the former.

Such square root. Very perform. So wow.

Let us take something very simple, say, a square root, and calculate it using Newton’s Method:

public double measureA() {
    double previous;
    double current = value / 2;

    do {
        previous = current;
        current = (previous + (value / previous)) / 2;
    } while ((previous - current) > 1e-15);

    return current;
}

And using the standard library:

public double measureB() {
    for (int i = 0; i < 1_000_000; i++) ;
    return Math.sqrt(value);
}

(The full sources are available on github)

When we run the benchmarks in interpeted mode, we see that our hand-written solution is far better than the standard method:

Benchmark    Mode  Samples        Score       Error  Units
measureA    thrpt       15  2155756.679 ± 56959.916  ops/s
measureB    thrpt       15      135.790 ±    99.860  ops/s

Well… Of course it is, because it involves a JNI call, and everyone knows JNI calls are expensive as hell. Duh. Those stupid engineers, they cannot even get such a basic thing as a square root right… My job here is done!

But just for the sake of the argument, let me run the same thing with the JIT compiler enabled:

Benchmark    Mode  Samples          Score          Error  Units
measureA    thrpt       15    7406433.538 ±   130914.320  ops/s
measureB    thrpt       15  174676301.127 ± 18372949.053  ops/s

Oh, wait. What happened? Now the performance of B has become far superior to that of A… Weird, isn’t it?

Alright, enough fooling around. I have, of course, cheated by adding the empty loop of 1,000,000 iterations in the beginning of B, which JIT has successfully eliminated as dead code. The interpreter is obviously unable to perform such a feat.

A Plausible Example

Nobody really puts dead code that dominates the execution time into their algorithms. So what we have above is clearly not a satisfying example, and we need to come up with something more realistic.

Off the top of my head, here are some optimizations that wouldn’t be considered cheating:

  • Inlining
  • Loop Unrolling
  • Bounds Checking Elimination
  • Escape Analysis

CC0 Freely use anything from this website in any way you can imagine