Performance test independent of the number of iterations

2018-06-27 07:37:37

Trying to answer to this ticket : What is the difference between instanceof and Class.isAssignableFrom(...)?

I made a performance test :

class A{}
class B extends A{}

A b = new B();

void execute(){
  boolean test = A.class.isAssignableFrom(b.getClass());
  // boolean test = A.class.isInstance(b);
  // boolean test = b instanceof A;
}

@Test
public void testPerf() {
  // Warmup the code
  for (int i = 0; i < 100; ++i)
    execute();

  // Time it
  int count = 100000;
  final long start = System.nanoTime();
  for(int i=0; i<count; i++){
     execute();
  }
  final long elapsed = System.nanoTime() - start;
System.out.println(count+" iterations took " + TimeUnit.NANOSECONDS.toMillis(elapsed) + "ms.);
}

Which gave me :

A.class.isAssignableFrom(b.getClass()) : 100000 iterations took 15ms

A.class.isInstance(b) : 100000 iterations took 12ms

b instanceof A : 100000 iterations took 6ms

But playing with the number of iterations, I can see the performance is constant. For Integer.MAX_VALUE :

A.class.isAssignableFrom(b.getClass()) : 2147483647 iterations took 15ms

A.class.isInstance(b) : 2147483647 iterations took 12ms

b instanceof A : 2147483647 iterations took 6ms

Thinking it was a compiler optimization (I ran this test with JUnit), I changed it into this :

@Test
public void testPerf() {
    boolean test = false;

    // Warmup the code
    for (int i = 0; i < 100; ++i)
        test |= b instanceof A;

    // Time it
    int count = Integer.MAX_VALUE;
    final long start = System.nanoTime();
    for(int i=0; i<count; i++){
        test |= b instanceof A;
    }
    final long elapsed = System.nanoTime() - start;
    System.out.println(count+" iterations took " + TimeUnit.NANOSECONDS.toMillis(elapsed) + "ms. AVG= " + TimeUnit.NANOSECONDS.toMillis(elapsed/count));

    System.out.println(test);
}

But the performance is still "independent" of the number of iterations. Could someone explain that behavior ?

The JIT compiler can eliminate loops which don't anything. This can be triggered after 10,000 iterations.

What I suspect you are timing is how long it takes for the JIT to detect that the loop doesn't do anything and remove it. This will be a little longer than it takes to do 10,000 iterations.

A hundred iterations is not nearly enough for warmup. The default compile threshold is 10000 iterations (a hundred times more), so best go at least a bit over that threshold.

Once the compilation has been triggered, the world is not stopped; the compilation takes place in the background. That means that its effect will start being observable only after a slight delay.

There is ample space for optimization of your test in such a way that the entire loop is collapsed into its final result. That would explain the constant numbers.

Anyway, I always do the benchmarks by having an outer method call the inner method something like 10 times. The inner method does a big number of iterations, say 10,000 or more, as needed to make its runtime rise into at least tens of milliseconds. I don't even bother with nanoTime since if microsecond precision is important to you, it is just a sign of measuring too short a time interval.

When you do it like this, you are making it easy for the JIT to execute a compiled version of the inner method after it was substituted for the interpreted version. Another benefit is that you get assurance that the times of the inner method are stabilizing.

If you want to make a real benchmark of a simple function, you should use a micro-benchmarking tool, like Caliper. It will be much simpler that trying to make your own benchmark.

链接地址: http://www.djcxy.com/p/76372.html

上一篇: instanceof运算符是否会产生很多开销？为什么？

下一篇: 性能测试独立于迭代次数