What Do Compilers Do with Compile-Time Branching

What do compilers do with compile-time branching?

TL;DR

There are several ways to get different run-time behavior dependent on a template parameter. Performance should not be your primary concern here, but flexibility and maintainability should. In all cases, the various thin wrappers and constant conditional expressions will all be optimized away on any decent compiler for release builds. Below a small summary with the various tradeoffs (inspired by this answer by @AndyProwl).

Run-time if

Your first solution is the simple run-time if:

template<class T>
T numeric_procedure(const T& x)
{
if (std::is_integral<T>::value) {
// valid code for integral types
} else {
// valid code for non-integral types,
// must ALSO compile for integral types
}
}

It is simple and effective: any decent compiler will optimize away the dead branch.

There are several disadvantages:

  • on some platforms (MSVC), a constant conditional expression yields a spurious compiler warning which you then need to ignore or silence.
  • But worse, on all conforming platforms, both branches of the if/else statement need to actually compile for all types T, even if one of the branches is known not to be taken. If T contains different member types depending on its nature, then you will get a compiler error as soon as you try to access them.

Tag dispatching

Your second approach is known as tag-dispatching:

template<class T>
T numeric_procedure_impl(const T& x, std::false_type)
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}

template<class T>
T numeric_procedure_impl(const T& x, std::true_type)
{
// valid code for integral types
}

template<class T>
T numeric_procedure(const T& x)
{
return numeric_procedure_impl(x, std::is_integral<T>());
}

It works fine, without run-time overhead: the temporary std::is_integral<T>() and the call to the one-line helper function will both be optimized way on any decent platform.

The main (minor IMO) disadvantage is that you have some boilerplate with 3 instead of 1 function.

SFINAE

Closely related to tag-dispatching is SFINAE (Substitution failure is not an error)

template<class T, class = typename std::enable_if<!std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}

template<class T, class = typename std::enable_if<std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
// valid code for integral types
}

This has the same effect as tag-dispatching but works slightly differently. Instead of using argument-deduction to select the proper helper overload, it directly manipulates the overload set for your main function.

The disadvantage is that it can be a fragile and tricky way if you don't know exactly what the entire overload set is (e.g. with template heavy code, ADL could pull in more overloads from associated namespaces you didn't think of). And compared to tag-dispatching, selection based on anything other than a binary decision is a lot more involved.

Partial specialization

Another approach is to use a class template helper with a function application operator and partially specialize it

template<class T, bool> 
struct numeric_functor;

template<class T>
struct numeric_functor<T, false>
{
T operator()(T const& x) const
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
};

template<class T>
struct numeric_functor<T, true>
{
T operator()(T const& x) const
{
// valid code for integral types
}
};

template<class T>
T numeric_procedure(T const& x)
{
return numeric_functor<T, std::is_integral<T>::value>()(x);
}

This is probably the most flexible approach if you want to have fine-grained control and minimal code duplication (e.g. if you also want to specialize on size and/or alignment, but say only for floating point types). The pattern matching given by partial template specialization is ideally suited for such advanced problems. As with tag-dispatching, the helper functors are optimized away by any decent compiler.

The main disadvantage is the slightly larger boiler-plate if you only want to specialize on a single binary condition.

If constexpr (C++1z proposal)

This is a reboot of failed earlier proposals for static if (which is used in the D programming language)

template<class T>
T numeric_procedure(const T& x)
{
if constexpr (std::is_integral<T>::value) {
// valid code for integral types
} else {
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
}

As with your run-time if, everything is in one place, but the main advantage here is that the else branch will be dropped entirely by the compiler when it is known not to be taken. A great advantage is that you keep all code local, and do not have to use little helper functions as in tag dispatching or partial template specialization.

Concepts-Lite (C++1z proposal)

Concepts-Lite is an upcoming Technical Specification that is scheduled to be part of the next major C++ release (C++1z, with z==7 as the best guess).

template<Non_integral T>
T numeric_procedure(const T& x)
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}

template<Integral T>
T numeric_procedure(const T& x)
{
// valid code for integral types
}

This approach replaces the class or typename keyword inside the template< > brackets with a concept name describing the family of types that the code is supposed to work for. It can be seen as a generalization of the tag-dispatching and SFINAE techniques. Some compilers (gcc, Clang) have experimental support for this feature. The Lite adjective is referring to the failed Concepts C++11 proposal.

Compile-time elimination of if/else branch in C++

All compilers behave correctly.

Your program is ill-formed, no diagnostic required, because you are odr-using Struct<true>::printIfFalse through the instantiation of Struct<true>::print() required from the call in withTrue.print();. A function that is odr-used outside of a discarded statement must have a definition in the program, see [basic.def.odr]/4, otherwise the program is ill-formed, no diagnostic required.

A discarded statement is what you get if you use if constexpr in a template and the statement is not in the chosen branch. So, what you can do to make the program well-formed is to use if constexpr instead of if. This is a C++17 feature.

How good is the Visual Studio compiler at branch-prediction for simple if-statements?

Branch-prediction is a run-time thing, done by the CPU not the compiler.

The relevant optimization here would be if-conversion to a very cheap branchless flag |= obj.someBool;.


Ahead-of-time C++ compilers make machine code for the CPU to run; they aren't interpreters. See also Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” and How to remove "noise" from GCC/clang assembly output? for more about looking at optimized compiler-generated asm.

I guess what you're suggesting could be making a 2nd version of the loop that doesn't look at the bool, and converting the if() into an if() goto to set the flag once and then run the other version of the loop from this point onward. That would likely not be worth it, since a single OR instruction is so cheap if other members of the same object are already getting accessed.

But it's a plausible optimization; however I don't think compilers would typically do it for you. You can of course do it manually, although you'd have to iterate manually instead of using a range-for, because you want to use the same iterator to start part-way through the range.


Branch likelihood estimation at compile time is a thing compilers do to figure out whether branchy or branchless code is appropriate, e.g. gcc optimization flag -O3 makes code slower than -O2 uses CMOV for a case that looks unpredictable, but when run on sorted data is actually very predictable. My answer there shows the asm real-world compilers make; note that they don't multi-version the loop, although that wouldn't be possible in that case if the compiler didn't know about the data being sorted.

Also to guess which side of a branch is more likely, so they can lay out the fast path with fewer taken branches. That's what the C++20 [[likely]] / [[unlikely]] hints are for, BTW, not actually influencing run-time branch prediction. Except on some CPUs, indirectly via static prediction the first time a CPU sees a branch. Or a few ISAs, like PowerPC and MIPS, have "branch-likely" instructions with actual run-time hints for the CPU which compilers might or might not actually use even if available. See

  • How do the likely/unlikely macros in the Linux kernel work and what is their benefit? - They influence branch layout, making the "likely" path a straight line (branches usually not-taken) for I-cache locality and contiguous fetch.
  • Is there a compiler hint for GCC to force branch prediction to always go a certain way?

Standard if/else on conditions known at compile-time?

The optimizer will remove the branch and the code in the unused branches, but beware: the compiler needs to process the function before the optimizer even gets a chance to look at the code, which means that all branches must be valid (compilable) for all values of N.

For example, if the second branch contained:

} else if (N <= 42) {
char data[50 - N];
// other code

The compiler will fail to instantiate the template for N >= 50 even though the branch will be removed by the optimizer.

Why compiler fails to understand the value of if branch at compile time?

You can see how the compilation error makes some sort of sense if you replace the constants with their actual value. For int, that code is equivalent to the following:

    if(true || false){
size+=1;
}
else{
size +=real_size(v);
}

Here the compiler would still complain that the else body is not valid: dead code must still be valid code.



Related Topics



Leave a reply



Submit