Work in progress. Do tell me if you notice anything amiss, but do not be surprised.
A while ago, there was a discussion on stackoverflow where some guy named Evil Menace jokingly offered to disable JIT compilation in order to get more consistent and predictable performance when running benchmarks.
Apparently, it might be not clear to everyone how exactly this is a bad idea. Seeing as the stackoverflow guy has failed to come up with an example of that, I have decided to do so on my own, even if it has been his plan all along.
Problem Statement
The original argument went thus:
I am not interested in how fast my code actually is. It doesn’t matter for me if it is 10 ms, 20 ms or 100 ms fast. I just want to compare two different algorithms and know which one is faster compared to the other.
My point is that such a comparison may well turn out to be meaningless, because the compiler
might easily turn the tables in such a comparison. So, we need to come up with two algorithms
A and B such that when benchmarked with -Xint
, A is faster than B,
but when benchmarked normally, the performance evens out, or even turns the other way around.
To achieve this, we need the JIT to either greatly optimize B, but not A, or just severely hurt the performance of A, but not B. The latter is not something that should happen (although it might), so we will go with the former.
Such square root. Very perform. So wow.
Let us take something very simple, say, a square root, and calculate it using Newton’s Method:
public double measureA() { double previous; double current = value / 2; do { previous = current; current = (previous + (value / previous)) / 2; } while ((previous - current) > 1e-15); return current; }
And using the standard library:
public double measureB() { for (int i = 0; i < 1_000_000; i++) ; return Math.sqrt(value); }
(The full sources are available on github)
When we run the benchmarks in interpeted mode, we see that our hand-written solution is far better than the standard method:
Benchmark Mode Samples Score Error Units measureA thrpt 15 2155756.679 ± 56959.916 ops/s measureB thrpt 15 135.790 ± 99.860 ops/s
Well… Of course it is, because it involves a JNI call, and everyone knows JNI calls are expensive as hell. Duh. Those stupid engineers, they cannot even get such a basic thing as a square root right… My job here is done!
But just for the sake of the argument, let me run the same thing with the JIT compiler enabled:
Benchmark Mode Samples Score Error Units measureA thrpt 15 7406433.538 ± 130914.320 ops/s measureB thrpt 15 174676301.127 ± 18372949.053 ops/s
Oh, wait. What happened? Now the performance of B has become far superior to that of A… Weird, isn’t it?
Alright, enough fooling around. I have, of course, cheated by adding the empty loop of 1,000,000 iterations in the beginning of B, which JIT has successfully eliminated as dead code. The interpreter is obviously unable to perform such a feat.
A Plausible Example
Nobody really puts dead code that dominates the execution time into their algorithms. So what we have above is clearly not a satisfying example, and we need to come up with something more realistic.
Off the top of my head, here are some optimizations that wouldn’t be considered cheating:
- Inlining
- Loop Unrolling
- Bounds Checking Elimination
- Escape Analysis