How do I write a correct micro-benchmark in Java?
Tips about writing micro benchmarks from the creators of Java HotSpot:
Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.
Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)
Rule 2: Always run with -XX:+PrintCompilation
, -verbose:gc
, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.
Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.
Rule 3: Be aware of the difference between -client
and -server
, and OSR and regular compilations. The -XX:+PrintCompilation
flag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run @ 2 (41 bytes)
. Prefer server to client, and regular to OSR, if you are after best performance.
Rule 4: Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.
Rule 5: Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.
Rule 6: Use appropriate tools to read the compiler's mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.
Rule 7: Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatch
to serialize the compiler with the application, and consider setting -XX:CICompilerCount=1
to prevent the compiler from running in parallel with itself. Try your best to reduce GC overhead, set Xmx
(large enough) equals Xms
and use UseEpsilonGC
if it is available.
Rule 8: Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliper or Bill and Paul's Excellent UCSD Benchmarks for Java.
Max and min time benchmark in Java with JMH
If you run your @Benchmark
with @BenchmarkMode(Mode.AverageTime)
, as a bonus, you will get an output like:
Result "StopAllBenchmarks.measureRight":
2.387 ±(99.9%) 0.307 ns/op [Average]
(min, avg, max) = (2.339, 2.387, 2.526), stdev = 0.080
CI (99.9%): [2.079, 2.694] (assumes normal distribution)
Notice the min, avg, max
JMH - How to correctly benchmark Thread Pools?
I solved this issue by myself with the help of other answerers. In the last edit (And in all other edits) the issue was in my gradle configuration, so I was running this benchmark in all of my system threads, I use this gradle plugin to run JMH and before making all of my benchmarks in my gradle buildscript I set threads = 4
value, so you saw these strange benchmark results because JMH tried to benchmark on all available threads thread pool doing work on all available threads. I removed this configuration and set @State(Scope.Thread)
and @Threads(1)
annotations in benchmark class, a bit edited runInThreadPool()
method to:
public static void runInThreadPool(int amountOfTasks, Blackhole bh, ExecutorService threadPool)
throws InterruptedException, ExecutionException {
Future<?>[] futures = new Future[amountOfTasks];
for (int i = 0; i < amountOfTasks; i++) {
futures[i] = threadPool.submit(PrioritySchedulerSamples::doWork, (ThreadFactory) runnable -> {
Thread thread = new Thread(runnable);
thread.setPriority(10);
return thread;
});
}
for (Future<?> future : futures) {
bh.consume(future.get());
}
threadPool.shutdownNow();
threadPool.awaitTermination(10, TimeUnit.SECONDS);
}
So each thread in this thread pool runs in maximal priority.
And benchmark of all these changes:
Benchmark (amountOfTasks) Mode Cnt Score Error Units
PrioritySchedulerSamples.fixedThreadPool 2048 avgt 3 8021054,516 ± 2874987,327 ns/op
PrioritySchedulerSamples.noThreading 2048 avgt 3 17583295,617 ± 5499026,016 ns/op
These results seems to be correct. (Especially for my system.)
I also made a list of common problems in microbenchmarking thread pools and basically all of the concurrent java components:
- Make sure your microbenchmark is executing in one thread, use
@Threads(1)
and@State(Scope.Thread)
annotations to make your microbenchmark executing in one thread. (use, for example,htop
command to find out how many and which threads are consuming the most CPU percentage) - Make sure you execute the task completely in your microbenchmark, and wait for all Threads to complete this task. (maybe your microbenchmark doesn't wait for tasks to complete?)
- Don't use
Thread.sleep()
for imitating the real work, instead JMH providesBlackhole.consumeCPU(long tokens)
method which you can freely use as imitation of some work. - Make sure you know the component that you benchmark. (obvious, but I didn't know before this post java thread pools very well)
- Make sure you know compiler optimization effects described in these JMH samles basically know JMH very well.
Plots in JMH [Java Micro-Benchmarking Harness]
JMH does not support plotting. You can write out the performance results into a file (e.g. with -rf csv
or -rf json
), and use whatever plotting tool you are familiar with. Or, you can extract the performance data from RunResult
instance you got from Java API, and parse/render it with any embedded library.
If you use settings to observe single call executions time (or batch calls) and you want to see plot of each durations (something like last plot here), you can combine settings:
@Measurement(batchSize = 1000, iterations = 500)
@BenchmarkMode({Mode.SingleShotTime})
And a bit of scripting to get desired data in csv. In settings like this, there is only summary data in resulting csv from jmh.
mvn package && java -jar target/benchmarks.jar -foe true -rf csv | tee output.txt
N=5 # set here number of benchmarks plus 2
grep Iteration -A 3 output.txt | grep -v Warmup | sed 's/$/,/' | xargs -l$N | sed 's/,$//'
It will output something like:
Iteration 1: 93.915 ±(99.9%) 2066.879 s/op, readerA: 28.487 s/op, readerB: 28.525 s/op, writer: 224.735 s/op, --
Iteration 2: 100.483 ±(99.9%) 1265.993 s/op, readerA: 59.927 s/op, readerB: 60.912 s/op, writer: 180.610 s/op, --
Iteration 3: 76.458 ±(99.9%) 760.395 s/op, readerA: 52.513 s/op, readerB: 52.276 s/op, writer: 124.586 s/op, --
Iteration 4: 84.046 ±(99.9%) 1189.029 s/op, readerA: 46.112 s/op, readerB: 46.724 s/op, writer: 159.303 s/op, --
java micro benchmark to find average from list
Is it what you are looking for?
@Warmup(iterations = 1, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 1, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class MyBenchmark {
ClassUnderBenchmark classUnderBenchmark = new ClassUnderBenchmark();
@State(Scope.Benchmark)
public static class MyTestState {
int counter = 0;
List<String> list = Arrays.asList("aaaaa", "bbbb", "ccc");
String currentString;
@Setup(Level.Invocation)
public void init() throws IOException {
this.currentString = list.get(counter++);
if (counter == 3) {
counter = 0;
}
}
}
@Benchmark
@Threads(1)
@BenchmarkMode(Mode.SampleTime)
public void test(MyBenchmark.MyTestState myTestState) {
classUnderBenchmark.toUpper(myTestState.currentString);
}
public static class ClassUnderBenchmark {
Random r = new Random();
public String toUpper(String name) {
try {
Thread.sleep(r.nextInt(100));
} catch (InterruptedException e) {
e.printStackTrace();
}
return name.toUpperCase();
}
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.jvmArgs("-XX:+UseG1GC", "-XX:MaxGCPauseMillis=50")
.build();
new Runner(opt).run();
}
}
Please see the javadoc (org.openjdk.jmh.annotations.Mode):
/**
* <p>Sample time: samples the time for each operation.</p>
*
* <p>Runs by continuously calling {@link Benchmark} methods,
* and randomly samples the time needed for the call. This mode automatically adjusts the sampling
* frequency, but may omit some pauses which missed the sampling measurement. This mode is time-based, and it will
* run until the iteration time expires.</p>
*/
SampleTime("sample", "Sampling time"),
This test will give you the output:
Result "test":
N = 91
mean = 0,056 ±(99.9%) 0,010 s/op
Histogram, s/op:
[0,000, 0,010) = 6
[0,010, 0,020) = 9
[0,020, 0,030) = 3
[0,030, 0,040) = 11
[0,040, 0,050) = 8
[0,050, 0,060) = 11
[0,060, 0,070) = 9
[0,070, 0,080) = 9
[0,080, 0,090) = 14
Percentiles, s/op:
p(0,0000) = 0,003 s/op
p(50,0000) = 0,059 s/op
p(90,0000) = 0,092 s/op
p(95,0000) = 0,095 s/op
p(99,0000) = 0,100 s/op
p(99,9000) = 0,100 s/op
p(99,9900) = 0,100 s/op
p(99,9990) = 0,100 s/op
p(99,9999) = 0,100 s/op
p(100,0000) = 0,100 s/op
Benchmark Mode Cnt Score Error Units
MyBenchmark.test sample 91 0,056 ± 0,010 s/op
How to benchmark '&' vs '%' cost correctly, using JMH
This is covered in detail in this blog post: http://psy-lob-saw.blogspot.co.za/2014/11/the-mythical-modulo-mask.html
Your benchmark is broken (comparing what seems like irrelevant quantities) because you are comparing (non_final_field & constant) with (constant % non_final_field). Replace with (non_final_field1 % non_final_field2) and (non_final_field1 & (non_final_field2-1)) where non_final_field2 is a power of 2.
In the context of HashMap the value is used to read from an array and the blog post covers the implications of that side as well.
What is microbenchmarking?
It means exactly what it says on the tin can - it's measuring the performance of something "small", like a system call to the kernel of an operating system.
The danger is that people may use whatever results they obtain from microbenchmarking to dictate optimizations. And as we all know:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of
all evil" -- Donald Knuth
There can be many factors that skew the result of microbenchmarks. Compiler optimizations is one of them. If the operation being measured takes so little time that whatever you use to measure it takes longer than the actual operation itself, your microbenchmarks will be skewed also.
For example, someone might take a microbenchmark of the overhead of for
loops:
void TestForLoop()
{
time start = GetTime();
for(int i = 0; i < 1000000000; ++i)
{
}
time elapsed = GetTime() - start;
time elapsedPerIteration = elapsed / 1000000000;
printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
}
Obviously compilers can see that the loop does absolutely nothing and not generate any code for the loop at all. So the value of elapsed
and elapsedPerIteration
is pretty much useless.
Even if the loop does something:
void TestForLoop()
{
int sum = 0;
time start = GetTime();
for(int i = 0; i < 1000000000; ++i)
{
++sum;
}
time elapsed = GetTime() - start;
time elapsedPerIteration = elapsed / 1000000000;
printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
}
The compiler may see that the variable sum
isn't going to be used for anything and optimize it away, and optimize away the for loop as well. But wait! What if we do this:
void TestForLoop()
{
int sum = 0;
time start = GetTime();
for(int i = 0; i < 1000000000; ++i)
{
++sum;
}
time elapsed = GetTime() - start;
time elapsedPerIteration = elapsed / 1000000000;
printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
printf("Sum: %d\n", sum); // Added
}
The compiler might be smart enough to realize that sum
will always be a constant value, and optimize all that away as well. Many would be surprised at the optimizing capabilities of compilers these days.
But what about things that compilers can't optimize away?
void TestFileOpenPerformance()
{
FILE* file = NULL;
time start = GetTime();
for(int i = 0; i < 1000000000; ++i)
{
file = fopen("testfile.dat");
fclose(file);
}
time elapsed = GetTime() - start;
time elapsedPerIteration = elapsed / 1000000000;
printf("Time elapsed for each file open: %d\n", elapsedPerIteration);
}
Even this is not a useful test! The operating system may see that the file is being opened very frequently, so it may preload it in memory to improve performance. Pretty much all operating systems do this. The same thing happens when you open applications - operating systems may figure out the top ~5 applications you open the most and preload the application code in memory when you boot up the computer!
In fact, there are countless variables that come into play: locality of reference (e.g. arrays vs. linked lists), effects of caches and memory bandwidth, compiler inlining, compiler implementation, compiler switches, number of processor cores, optimizations at the processor level, operating system schedulers, operating system background processes, etc.
So microbenchmarking isn't exactly a useful metric in a lot of cases. It definitely does not replace whole-program benchmarks with well-defined test cases (profiling). Write readable code first, then profile to see what needs to be done, if any.
I would like to emphasize that microbenchmarks are not evil per se, but one has to use them carefully (that's true for lots of other things related to computers)
Run Micro-benchmark in application servers [JMH]
@Benchmark
Javadoc says:
* <p>{@link Benchmark} demarcates the benchmark payload, and JMH treats it specifically
* as the wrapper which contains the benchmark code. In order to run the benchmark reliably,
* JMH enforces a few stringent properties for these wrapper methods, including, but not
* limited to:</p>
@Benchmark
is the annotation demarcating the piece of code JMH should treat as benchmark body. It is not a magic annotation which measures any given method elsewhere in the program. And, you are not supposed to call @Benchmark
methods on your own.
What you want is not benchmark, it's a tracing/monitoring/profiling solution which can instrument and account all the stages the request gets through. JMH is not such a solution, you should look elsewhere.
Related Topics
Difference Between Chromedriver and Webdriver in Selenium
When Do You Use Java'S @Override Annotation and Why
How to Read a Large Text File Line by Line Using Java
Generating All Permutations of a Given String
Using Jfreechart to Display Recent Changes in a Time Series
Why Is It Bad Practice to Call System.Gc()
How to Use Java 8 For Android Development
Convert Inputstream to Byte Array in Java
Technically What Is the Main Difference Between Oracle Jdk and Openjdk
How to Read/Convert an Inputstream into a String in Java
How to Invoke a Java Method When Given the Method Name as a String
How to Check If a String Is Numeric in Java
How to Properly Compare Two Integers in Java
How to Break Out of Nested Loops in Java
Why Is Super.Super.Method(); Not Allowed in Java
Android, Getting Resource Id from String
How to Fix Java.Lang.Unsupportedclassversionerror: Unsupported Major.Minor Version