Is There a Compiler Hint for Gcc to Force Branch Prediction to Always Go a Certain Way

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

As of C++20 the likely and unlikely attributes should be standardized and are already supported in g++9. So as discussed here, you can write

if (a > b) {
/* code you expect to run often */
[[likely]] /* last statement here */
}

e.g. in the following code the else block gets inlined thanks to the [[unlikely]] in the if block

int oftendone( int a, int b );
int rarelydone( int a, int b );
int finaltrafo( int );

int divides( int number, int prime ) {
int almostreturnvalue;
if ( ( number % prime ) == 0 ) {
auto k = rarelydone( number, prime );
auto l = rarelydone( number, k );
[[unlikely]] almostreturnvalue = rarelydone( k, l );
} else {
auto a = oftendone( number, prime );
almostreturnvalue = oftendone( a, a );
}
return finaltrafo( almostreturnvalue );
}

godbolt link comparing the presence/absence of the attribute

How good is the Visual Studio compiler at branch-prediction for simple if-statements?

Branch-prediction is a run-time thing, done by the CPU not the compiler.

The relevant optimization here would be if-conversion to a very cheap branchless flag |= obj.someBool;.


Ahead-of-time C++ compilers make machine code for the CPU to run; they aren't interpreters. See also Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” and How to remove "noise" from GCC/clang assembly output? for more about looking at optimized compiler-generated asm.

I guess what you're suggesting could be making a 2nd version of the loop that doesn't look at the bool, and converting the if() into an if() goto to set the flag once and then run the other version of the loop from this point onward. That would likely not be worth it, since a single OR instruction is so cheap if other members of the same object are already getting accessed.

But it's a plausible optimization; however I don't think compilers would typically do it for you. You can of course do it manually, although you'd have to iterate manually instead of using a range-for, because you want to use the same iterator to start part-way through the range.


Branch likelihood estimation at compile time is a thing compilers do to figure out whether branchy or branchless code is appropriate, e.g. gcc optimization flag -O3 makes code slower than -O2 uses CMOV for a case that looks unpredictable, but when run on sorted data is actually very predictable. My answer there shows the asm real-world compilers make; note that they don't multi-version the loop, although that wouldn't be possible in that case if the compiler didn't know about the data being sorted.

Also to guess which side of a branch is more likely, so they can lay out the fast path with fewer taken branches. That's what the C++20 [[likely]] / [[unlikely]] hints are for, BTW, not actually influencing run-time branch prediction. Except on some CPUs, indirectly via static prediction the first time a CPU sees a branch. Or a few ISAs, like PowerPC and MIPS, have "branch-likely" instructions with actual run-time hints for the CPU which compilers might or might not actually use even if available. See

  • How do the likely/unlikely macros in the Linux kernel work and what is their benefit? - They influence branch layout, making the "likely" path a straight line (branches usually not-taken) for I-cache locality and contiguous fetch.
  • Is there a compiler hint for GCC to force branch prediction to always go a certain way?

Can I improve branch prediction with my code?

TL:DR: Yes, in C or C++ use a likely() macro, or C++20 [[likely]], to help the compiler make better asm. That's separate from influencing actual CPU branch-prediction, though. If writing in asm, lay out your code to minimize taken branches.


For most ISAs, there's no way in asm to hint the CPU whether a branch is likely to be taken or not. (Some exceptions include Pentium 4 (but not earlier or later x86), PowerPC, and some MIPS, which allow branch hints as part of conditional-branch asm instructions.)

  • Is it possible to tell the branch predictor how likely it is to follow the branch?

But not-taken straight-line code is cheaper than taken, so hinting high-level language to lay out code with the fast-path contiguous doesn't help branch prediction accuracy, but can help (or hurt) performance. (I-cache locality, front-end bandwidth: remember code-fetch happens in contiguous 16 or 32-byte blocks, so a taken branch means a later part of that fetch block isn't useful. Also, branch prediction throughput; some CPUs like Intel Skylake for example can't handle a predicted-taken branch at more than 1 per 2 clocks, other than loop branches. That include unconditional branches like jmp or ret.)

Taken branches are hard; not-taken branches keep the CPU on its toes, but if the prediction is accurate it's just a normal instruction for an execution unit (verifying the prediction), with nothing special for the front-end. See also Modern Microprocessors
A 90-Minute Guide! which has a section on branch prediction. (And is overall excellent.)

  • What exactly happens when a skylake CPU mispredicts a branch?
  • Avoid stalling pipeline by calculating conditional early
  • How does the branch predictor know if it is not correct?

Many people misunderstand source-level branch hints as branch prediction hints. That could be one effect if compiling for a CPU that supports branch hints in asm, but for most the significant effect is in layout, and deciding whether to use branchless (cmov) or not; a [[likely]] condition also means it should predict well.

With some CPUs, especially older, layout of a branch did sometimes influence runtime prediction: if the CPU didn't remember anything about the branch in its dynamic predictors, the standard static prediction heuristic is that forward conditional branches are not-taken, backward conditional are assumed taken (because that's normally the bottom of a loop. See the BTFNT section in https://danluu.com/branch-prediction/.

A compiler can lay out an if(c) x else y; either way, either matching the source with jump over x if !c as the opening thing, or swap the if and else blocks and use the opposite branch condition. Or it can put one block out-of-line (e.g. after the ret at the end of the function) so the fast path has no taken branches conditional or otherwise, while the less likely path has to jump there and then jump back.


It's easy to do more harm than good with branch hints in high-level source, especially if surrounding code changes without paying attention to them, so profile-guided optimization is the best way for compilers to learn about branch predictability and likelihood. (e.g. gcc -O3 -fprofile-generate / run with some representative inputs that exercise code-paths in relevant ways / gcc -O3 -fprofile-use)

But there are ways to hint in some languages, like C++20 [[likely]] and [[unlikely]], which are the portable version of GNU C likely() / unlikely() macros around __builtin_expect.

  • https://en.cppreference.com/w/cpp/language/attributes/likely C++20 [[likely]]
  • How to use C++20's likely/unlikely attribute in if-else statement syntax help
  • Is there a compiler hint for GCC to force branch prediction to always go a certain way? (to the literal question, no. To what's actually wanted, branch hints to the compiler, yes.)
  • How do the likely/unlikely macros in the Linux kernel work and what is their benefit? The GNU C macros using __builtin_expect, same effect but different syntax than C++20 [[likely]]
  • What is the advantage of GCC's __builtin_expect in if else statements? example asm output. (Also see CiroSantilli's answers to some of the other questions where he made examples.)
  • Simple example where [[likely]] and [[unlikely]] affect program assembly?

I don't know of ways to annotate branches for languages other than GNU C / C++, and ISO C++20.



Absent any hints or profile data

Without that, optimizing compilers have to use heuristics to guess which side of a branch is more likely. If it's a loop branch, they normally assume that the loop will run multiple times. On an if, they have some heuristics based on the actual condition and maybe what's in the blocks being controlled; IDK I haven't looked into what gcc or clang do.

I have noticed that GCC does care about the condition, though. It's not as naive as assuming that int values are uniformly randomly distributed, although I think it normally assumes that if (x == 10) foo(); is somewhat unlikely.

JIT compilers like in a JVM have an advantage here: they can potentially instrument branches in the early stages of running, to collect branch-direction information before making final optimized asm. OTOH they need to compile fast because compile time is part of total run time, so they don't try as hard to make good asm, which is a major disadvantage in terms of code quality.

Is it possible to hint a specific if-statement branch is most likely to be executed in the Delphi compiler?

There is nothing in the language or compiler that allows you to supply hints for branch prediction. And in any case, modern architectures would ignore those hints even if the compiler emitted object code that contained hints.

Portable branch prediction hints

The canonical way to do static branch prediction is that if is predicted not-branched (i.e. every if clause is executed, not else), and loops and backward-gotos are taken. So, don't put the common case in else if you expect static prediction to be significant. Getting around an untaken loop isn't as easy; I've never tried but I suppose putting it an an else clause should work pretty portably.

Many compilers support some form of #pragma unroll, but it will still be necessary to guard it with some kind of #if to protect other compilers.

Branch prediction hints can theoretically express a complete description of how to transform a program's flow-control graph and arrange the basic blocks in executable memory… so there are a variety of things to express, and most won't be very portable.

As GNU recommends in the documentation for __builtin_expect, profile-guided optimization is superior to hints, and with less effort.



Related Topics



Leave a reply



Submit