Why is branch prediction faster than no branch at all?
Inspired by this question: Why is it faster to process a sorted array than an unsorted array?
I wrote my own branch prediction experiment:
public class BranchPrediction {
public static void main(final String[] args) {
long start;
long sum = 0;
/* No branch */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
/* With branch */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
if (i >= 0)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
/* No branch (again) */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
/* With branch (again) */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
if (i >= 0)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
}
}
The result confuses me: according to program output, the loop with a branch is reliably faster than no branch loops.
Example output:
7949691477
-5340232226128654848
6947699555
-5340232226128654848
7920972795
-5340232226128654848
7055459799
-5340232226128654848
Why is it so?
Edit:
Have in mind that JVM is optimizing execution internally, and there are caches inside your PC that make computing faster. Since you have so powerful processor (many independant cores) it is not strange. Also note that there is code that runs under the Java code which maps to machine code of your PC. Just type code as optimized as you can, let JVM worry about it.
EDIT: Machines and hardware like big load, they operate with more efficiency then. Especially caches.
After running the same experiment on my other machines (Intel servers and workstations), I may conclude that the phenomenon I experienced is specific to this laptop CPU (Intel i7 Q740M).
==== 6 months later edit ====
Check this out: http://eli.thegreenplace.net/2013/12/03/intel-i7-loop-performance-anomaly/
链接地址: http://www.djcxy.com/p/282.html上一篇: 为什么(a * b!= 0)在Java中比(a!= 0 && b!= 0)更快?
下一篇: 为什么分支预测比没有分支更快?