Static branch prediction / GCC optimization

2018-06-26 10:37:27

Consider the following C program:

void bar();
void baz();

void foo( int a ) {
    if ( a ) {
        bar();
    }
    else {
        baz();
    }
}

On my x86-64-based computer, the instructions generated by GCC with the -O1 optimization level gives:

 0: sub    $0x8,%rsp
 4: test   %edi,%edi
 6: je     14 <foo+0x14>
 8: mov    $0x0,%eax
 d: callq  12 <foo+0x12> # relocation to bar
12: jmp    1e <foo+0x1e>
14: mov    $0x0,%eax
19: callq  1e <foo+0x1e> # relocation to baz
1e: add    $0x8,%rsp
22: retq

whereas adding the -freorder-blocks optimization parameter (included in -O2) turns the code into:

 0: sub    $0x8,%rsp
 4: test   %edi,%edi
 6: jne    17 <foo+0x17>
 8: mov    $0x0,%eax
 d: callq  12 <foo+0x12> # relocation to baz
12: add    $0x8,%rsp
16: retq   
17: mov    $0x0,%eax
1c: callq  21 <foo+0x21> # relocation to bar
21: add    $0x8,%rsp
25: retq

what is mainly a change from jump equals to jump not equals. I know that up to Pentium 4, static branch prediction on a conditional forward branch was considered not taken by the processor (it seems that static prediction became random on further Intel processors), thus I imagine this optimization is dealing with this.

Assuming that and refering to the jne optimized version, it would mean that the else block is in fact considered to be more likely executed than the if block in the program flow.

But what does it mean exactly? Since there is no assumption on the a value in the foo function by the compiler, such probability relies on the programmer's writings only (who could in fact have used if ( !a ) instead of if ( a ) and inverted function calls).

Does that mean that it should be considered as a good practice to treat if conditional blocks as exceptional cases (and not the normal execution flow)?

That is:

if ( !cond ) {
    // exceptional code
}
else {
    // normal continuation
}

instead of:

if ( cond ) {
    // normal continuation
}
else {
    // exceptional code
}

(of course, one could prefer using return statement inside relevant block to limit indentation size).

I once had significant amount of performance optimization actions on ARM(7,9). It was plain C, dumb enough compiler (SDT AFAIR). One of the way to save some CPU resources was to analyse if branches and rewrite if condition so normal flow doesn't break linear instructions sequence. This had positive effect both because of CPU prediction block more efficient usage and more efficient code segment memory cache usage.

I think here we see optimization which is very close. In first code fragment both branches lead to normal sequence being broken (line with lavel 6 for one branch and 12 for another). In second fragment one branch instructions are ordered up to retq and other branch sequence has single jump (not worse than it was in first fragment). Please pay attention to 2 retq instructions.

So as I can see this is not the question of je or jne but rather question of blocks reordering so branches are linear instructions sequence with one of them entered without any jump and full prediction block power saved.

Regarding to "why GCC prefers one branch over another"... I see in documentation this can be result of static branch prediction (based on calls inside translation unit?). Anyway I'd recommend to play with __builtin_expect to have more detailed answer.

链接地址: http://www.djcxy.com/p/73966.html

上一篇: Bash提示行换行问题

下一篇: 静态分支预测/ GCC优化