Static branch prediction / GCC optimization
Consider the following C program:
void bar();
void baz();
void foo( int a ) {
if ( a ) {
bar();
}
else {
baz();
}
}
On my x86-64-based computer, the instructions generated by GCC with the -O1 optimization level gives:
0: sub $0x8,%rsp
4: test %edi,%edi
6: je 14 <foo+0x14>
8: mov $0x0,%eax
d: callq 12 <foo+0x12> # relocation to bar
12: jmp 1e <foo+0x1e>
14: mov $0x0,%eax
19: callq 1e <foo+0x1e> # relocation to baz
1e: add $0x8,%rsp
22: retq
whereas adding the -freorder-blocks optimization parameter (included in -O2) turns the code into:
0: sub $0x8,%rsp
4: test %edi,%edi
6: jne 17 <foo+0x17>
8: mov $0x0,%eax
d: callq 12 <foo+0x12> # relocation to baz
12: add $0x8,%rsp
16: retq
17: mov $0x0,%eax
1c: callq 21 <foo+0x21> # relocation to bar
21: add $0x8,%rsp
25: retq
what is mainly a change from jump equals to jump not equals. I know that up to Pentium 4, static branch prediction on a conditional forward branch was considered not taken by the processor (it seems that static prediction became random on further Intel processors), thus I imagine this optimization is dealing with this.
Assuming that and refering to the jne optimized version, it would mean that the else block is in fact considered to be more likely executed than the if block in the program flow.
But what does it mean exactly? Since there is no assumption on the a value in the foo function by the compiler, such probability relies on the programmer's writings only (who could in fact have used if ( !a )
instead of if ( a )
and inverted function calls).
Does that mean that it should be considered as a good practice to treat if conditional blocks as exceptional cases (and not the normal execution flow)?
That is:
if ( !cond ) {
// exceptional code
}
else {
// normal continuation
}
instead of:
if ( cond ) {
// normal continuation
}
else {
// exceptional code
}
(of course, one could prefer using return statement inside relevant block to limit indentation size).
I once had significant amount of performance optimization actions on ARM(7,9). It was plain C, dumb enough compiler (SDT AFAIR). One of the way to save some CPU resources was to analyse if
branches and rewrite if
condition so normal flow doesn't break linear instructions sequence. This had positive effect both because of CPU prediction block more efficient usage and more efficient code segment memory cache usage.
I think here we see optimization which is very close. In first code fragment both branches lead to normal sequence being broken (line with lavel 6
for one branch and 12
for another). In second fragment one branch instructions are ordered up to retq
and other branch sequence has single jump (not worse than it was in first fragment). Please pay attention to 2 retq
instructions.
So as I can see this is not the question of je
or jne
but rather question of blocks reordering so branches are linear instructions sequence with one of them entered without any jump
and full prediction block power saved.
Regarding to "why GCC prefers one branch over another"... I see in documentation this can be result of static branch prediction (based on calls inside translation unit?). Anyway I'd recommend to play with __builtin_expect
to have more detailed answer.
上一篇: Bash提示行换行问题
下一篇: 静态分支预测/ GCC优化