Portable Branch Prediction Hints

Portable branch prediction hints

The canonical way to do static branch prediction is that if is predicted not-branched (i.e. every if clause is executed, not else), and loops and backward-gotos are taken. So, don't put the common case in else if you expect static prediction to be significant. Getting around an untaken loop isn't as easy; I've never tried but I suppose putting it an an else clause should work pretty portably.

Many compilers support some form of #pragma unroll, but it will still be necessary to guard it with some kind of #if to protect other compilers.

Branch prediction hints can theoretically express a complete description of how to transform a program's flow-control graph and arrange the basic blocks in executable memory… so there are a variety of things to express, and most won't be very portable.

As GNU recommends in the documentation for __builtin_expect, profile-guided optimization is superior to hints, and with less effort.

Portable branch prediction hint in c++

C++20 is, er, likely to have attributes for the purpose. (The Committee Draft is still being reviewed.)

Can I improve branch prediction with my code?

TL:DR: Yes, in C or C++ use a likely() macro, or C++20 [[likely]], to help the compiler make better asm. That's separate from influencing actual CPU branch-prediction, though. If writing in asm, lay out your code to minimize taken branches.


For most ISAs, there's no way in asm to hint the CPU whether a branch is likely to be taken or not. (Some exceptions include Pentium 4 (but not earlier or later x86), PowerPC, and some MIPS, which allow branch hints as part of conditional-branch asm instructions.)

  • Is it possible to tell the branch predictor how likely it is to follow the branch?

But not-taken straight-line code is cheaper than taken, so hinting high-level language to lay out code with the fast-path contiguous doesn't help branch prediction accuracy, but can help (or hurt) performance. (I-cache locality, front-end bandwidth: remember code-fetch happens in contiguous 16 or 32-byte blocks, so a taken branch means a later part of that fetch block isn't useful. Also, branch prediction throughput; some CPUs like Intel Skylake for example can't handle a predicted-taken branch at more than 1 per 2 clocks, other than loop branches. That include unconditional branches like jmp or ret.)

Taken branches are hard; not-taken branches keep the CPU on its toes, but if the prediction is accurate it's just a normal instruction for an execution unit (verifying the prediction), with nothing special for the front-end. See also Modern Microprocessors
A 90-Minute Guide! which has a section on branch prediction. (And is overall excellent.)

  • What exactly happens when a skylake CPU mispredicts a branch?
  • Avoid stalling pipeline by calculating conditional early
  • How does the branch predictor know if it is not correct?

Many people misunderstand source-level branch hints as branch prediction hints. That could be one effect if compiling for a CPU that supports branch hints in asm, but for most the significant effect is in layout, and deciding whether to use branchless (cmov) or not; a [[likely]] condition also means it should predict well.

With some CPUs, especially older, layout of a branch did sometimes influence runtime prediction: if the CPU didn't remember anything about the branch in its dynamic predictors, the standard static prediction heuristic is that forward conditional branches are not-taken, backward conditional are assumed taken (because that's normally the bottom of a loop. See the BTFNT section in https://danluu.com/branch-prediction/.

A compiler can lay out an if(c) x else y; either way, either matching the source with jump over x if !c as the opening thing, or swap the if and else blocks and use the opposite branch condition. Or it can put one block out-of-line (e.g. after the ret at the end of the function) so the fast path has no taken branches conditional or otherwise, while the less likely path has to jump there and then jump back.


It's easy to do more harm than good with branch hints in high-level source, especially if surrounding code changes without paying attention to them, so profile-guided optimization is the best way for compilers to learn about branch predictability and likelihood. (e.g. gcc -O3 -fprofile-generate / run with some representative inputs that exercise code-paths in relevant ways / gcc -O3 -fprofile-use)

But there are ways to hint in some languages, like C++20 [[likely]] and [[unlikely]], which are the portable version of GNU C likely() / unlikely() macros around __builtin_expect.

  • https://en.cppreference.com/w/cpp/language/attributes/likely C++20 [[likely]]
  • How to use C++20's likely/unlikely attribute in if-else statement syntax help
  • Is there a compiler hint for GCC to force branch prediction to always go a certain way? (to the literal question, no. To what's actually wanted, branch hints to the compiler, yes.)
  • How do the likely/unlikely macros in the Linux kernel work and what is their benefit? The GNU C macros using __builtin_expect, same effect but different syntax than C++20 [[likely]]
  • What is the advantage of GCC's __builtin_expect in if else statements? example asm output. (Also see CiroSantilli's answers to some of the other questions where he made examples.)
  • Simple example where [[likely]] and [[unlikely]] affect program assembly?

I don't know of ways to annotate branches for languages other than GNU C / C++, and ISO C++20.



Absent any hints or profile data

Without that, optimizing compilers have to use heuristics to guess which side of a branch is more likely. If it's a loop branch, they normally assume that the loop will run multiple times. On an if, they have some heuristics based on the actual condition and maybe what's in the blocks being controlled; IDK I haven't looked into what gcc or clang do.

I have noticed that GCC does care about the condition, though. It's not as naive as assuming that int values are uniformly randomly distributed, although I think it normally assumes that if (x == 10) foo(); is somewhat unlikely.

JIT compilers like in a JVM have an advantage here: they can potentially instrument branches in the early stages of running, to collect branch-direction information before making final optimized asm. OTOH they need to compile fast because compile time is part of total run time, so they don't try as hard to make good asm, which is a major disadvantage in terms of code quality.

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

As of C++20 the likely and unlikely attributes should be standardized and are already supported in g++9. So as discussed here, you can write

if (a > b) {
/* code you expect to run often */
[[likely]] /* last statement here */
}

e.g. in the following code the else block gets inlined thanks to the [[unlikely]] in the if block

int oftendone( int a, int b );
int rarelydone( int a, int b );
int finaltrafo( int );

int divides( int number, int prime ) {
int almostreturnvalue;
if ( ( number % prime ) == 0 ) {
auto k = rarelydone( number, prime );
auto l = rarelydone( number, k );
[[unlikely]] almostreturnvalue = rarelydone( k, l );
} else {
auto a = oftendone( number, prime );
almostreturnvalue = oftendone( a, a );
}
return finaltrafo( almostreturnvalue );
}

godbolt link comparing the presence/absence of the attribute

Hint for branch prediction in assertions

The performance gain is not likely to be significant, but this is how those linux kernel macros are defined:

#define likely(x)      __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

So, you could modify your condition like this (assuming that expr is expected to be true and therefore !(expr) is expected to be false):

if (__builtin_expect(!(expr), 0)) {

Or you could define the same macros as the kernel and use them for better readability.

This is gcc builtin, so not portable of course.

This suggests that clang also supports the builtin. Othrwise, you can use the above macros and conditionally define them like #define likely(x) (x) on compilers that don't support the builtin.

In your case, the prediction is going to be good (either that or you're aborting), so there shouldn't be a risk of pessimisation, but if you do consider using the builtin more widely, here's a word of advice from gcc documentation:

In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform.

Do branch likelihood hints carry through function calls?

The story appears to be mixed for different compilers.

On GCC, I think your inline likely function works, or at least has some effect. Using Compiler Explorer to test differences on this code:

inline bool likely(bool x) { 
if(x) [[likely]] return true;
else return false;
}

//#define LIKELY(x) likely(x)
#define LIKELY(x) x

int f(int x) {
if (LIKELY(!x)) {
return -3548;
}
else {
return x + 1;
}
}

This function f adds 1 to x and returns it, unless x is 0, in which case it returns -3548. The LIKELY macro, when it's active, indicates to the compiler that the case where x is zero is more common.

This version, with no change, produces this assembly under GCC 10 -O1:

f(int):
test edi, edi
je .L3
lea eax, [rdi+1]
ret
.L3:
mov eax, -3548
ret

With the #define changed to the inline function with the [[likely]], we get:

f(int):
lea eax, [rdi+1]
test edi, edi
mov edx, -3548
cmove eax, edx
ret

That's a conditional move instead of a conditional jump. A win, I guess, albeit for a simple example.

This indicates that branch weights propagate through inline functions, which makes sense.

On clang, however, there is limited support for the likely and unlikely attributes, and where there is it does not seem to propagate through inline function calls, according to @Peter Cordes 's report.

There is, however, a hacky macro solution that I think also works:

#define EMPTY()
#define LIKELY(x) x) [[likely]] EMPTY(

Then anything like

if ( LIKELY(x) ) {

becomes like

if ( x) [[likely]] EMPTY( ) {

which then becomes

if ( x) [[likely]] {

.

Example: https://godbolt.org/z/nhfehn

Note however that this probably only works in if-statements, or in other cases that the LIKELY is enclosed in parentheses.

Branch Prediction: Branch Order vs builtin_expect

The compiler's optimiser is allowed to reorder branches. The __builtin_expect is useful if the compiler gets (or is likely) to get it wrong.



Related Topics



Leave a reply



Submit