Is it possible to tell the branch predictor how likely it is to follow the branch?

Just to make it clear, I'm not going for any sort of portability here, so any solutions that will tie me to a certain box is fine.

Basically, I have an if statement that will 99% of the time evaluate to true, and am trying to eke out every last clock of performance, can I issue some sort of compiler command (using GCC 4.1.2 and the x86 ISA, if it matters) to tell the branch predictor that it should cache for that branch?


Yes. http://kerneltrap.org/node/4705

The __builtin_expect is a method that gcc (versions >= 2.96) offer for programmers to indicate branch prediction information to the compiler. The return value of __builtin_expect is the first argument (which could only be an integer) passed to it.

if (__builtin_expect (x, 0))
                foo ();

     [This] would indicate that we do not expect to call `foo', since we
     expect `x' to be zero. 

Yes, but it will have no effect. Exceptions are older (obsolete) architectures pre Netburst, and even then it doesn't do anything measurable.

There is an "branch hint" opcode Intel introduced with the Netburst architecture, and a default static branch prediction for cold jumps (backward predicted taken, forward predicted non taken) on some older architectures. GCC implements this with the __builtin_expect (x, prediction) , where prediction is typically 0 or 1. The opcode emitted by the compiler is ignored on all newer processor architecures (>= Core 2). The small corner case where this actually does something is the case of a cold jump on the old Netburst architecture. Intel recommends now not to use the static branch hints, probably because they consider the increase of the code size more harmful than the possible marginal speed up.

Besides the useless branch hint for the predictor, __builtin_expect has its use, the compiler may reorder the code to improve cache usage or save memory.

There are multiple reasons it doesn't work as expected.

  • The processor can predict small loops (n<64) perfectly.
  • The processor can predict small repeating patterns (n~7) perfectly.
  • The processor itself can estimate the probability of a branch during runtime better than the compiler/programmer during compile time.
  • The predictability (= probability a branch will get predicted correctly) of a branch is far more important than the probability that the branch is taken. Unfortunately, this is highly architecture-dependent, and predicting the predictability of branch is notoriously hard.
  • Read more about the inner works of the branch prediction at Agner Fogs manuals. See also the gcc mailing list.


    Like Drakosha says, telling gcc which branch is the common case, so it generates better code for the case where the branch predictor is cold, and so the fast path through the function is easy for the CPU to execute, is probably very useful.

    FYI, Pentium 4 had branch-predictor hints as prefixes to the jcc instructions, but only the netburst microarchitecture ever did anything with them. See http://ref.x86asm.net/geek32.html. And Section 3.5 of Agner Fog's excellent asm opt guide, from http://www.agner.org/optimize/. He has a guide to optimizing in C++, too.

    Not much is officially published about exactly how the branch predictors and branch-target-buffers in the most recent Intel and AMD CPUs behave. The optimization manuals (easy to find on AMD's and Intel's web sites) give some advice, but don't document specific behaviour. Some people have run tests to try to divine the implementation, eg how many BTB entries Core2 has... Anyway, the idea of hinting the predictor explicitly has been abandoned (for now). What is documented is for example that Core2 has a branch history buffer that can avoid mispredicting the loop-exit if the loop always runs a constant short number of iterations, < 8 or 16 IIRC. But don't be too quick to unroll, because a loop that fits in 64bytes (or 19uops on Penryn) won't have instruction fetch bottlenecks because it replays from a buffer... go read Agner Fog's pdfs, they're excellent.

    链接地址: http://www.djcxy.com/p/54874.html

    上一篇: 在if中使用可能()/不可能()预处理器宏

    下一篇: 是否有可能告诉分支预测器跟随分支的可能性有多大?