Can a Recursive Function Be Inline

Can a recursive function be inline?

First, the inline specification on a function is just a hint. The compiler can (and often does) completely ignore the presence or absence of an inline qualifier. With that said, a compiler can inline a recursive function, much as it can unroll an infinite loop. It simply has to place a limit on the level to which it will "unroll" the function.

An optimizing compiler might turn this code:

inline int factorial(int n)
{
if (n <= 1)
{
return 1;
}
else
{
return n * factorial(n - 1);
}
}

int f(int x)
{
return factorial(x);
}

into this code:

int factorial(int n)
{
if (n <= 1)
{
return 1;
}
else
{
return n * factorial(n - 1);
}
}

int f(int x)
{
if (x <= 1)
{
return 1;
}
else
{
int x2 = x - 1;
if (x2 <= 1)
{
return x * 1;
}
else
{
int x3 = x2 - 1;
if (x3 <= 1)
{
return x * x2 * 1;
}
else
{
return x * x2 * x3 * factorial(x3 - 1);
}
}
}
}

In this case, we've basically inlined the function 3 times. Some compilers do perform this optimization. I recall MSVC++ having a setting to tune the level of inlining that would be performed on recursive functions (up to 20, I believe).

What happens if we make recursive functions as inline?

"inline" isn't a guarantee, it's a request.

Your recursive inline function won't (typically) be inline.

  • As some commenters point out, there are special cases (e.g. using compiler-specific pragmas) in which inlining is possible.

Inline functions cant contain recursion, go to loops etc?

This code is perfectly valid :

class Test 
{
public:
inline void sayHello(int i);
};

void Test::sayHello(int i)
{
std::cout << "Hello "<< i << std::endl;
if (i>0)
sayHello(i-1);
}

int main(int argc, char** argv) {
Test t;
t.sayHello(10);
}

.. and recurses as expected.

(GCC 4.6.3)

In this case, the function is not inlined because inlining is just a hint for the compiler. Not a requirement.

As suggester in the above comments, you will find a more complete answer here: Can a recursive function be inline?

Inlining of a recursive function

Here’s the solution I’ve found, thanks to grek40’s comment and to StoryTeller’s answer.

(As for my previous problem with the unused function template instance left in the compiled binary, I solved it by compiling the original code — without the gnu::always_inline and gnu::flatten attributes — with the arguments -ffunction-sections -fdata-sections -Wl,--gc-sections.)

Now reflect_mask_helper_0 is inside a struct (because C++ doesn’t allow partial specialization of function templates), and the i parameter of the function became the Index parameter of the struct template.

#include <iostream>
#include <limits.h>

// End recursive template-expansion of function select below.
template <typename Type>
static inline constexpr Type select(unsigned index)
{ return Type(); }

// Select one of the items passed to it.
// e.g. select(0, a, b, c) = a; select(1, a, b, c) = b; etc.
template <typename Type, typename... Params>
[[gnu::always_inline]]
static inline constexpr Type select(unsigned index, Type value, Params... values)
{ return index == 0 ? value : select<Type>(index - 1, values...); }

template <typename Type>
[[gnu::always_inline]]
static inline constexpr Type reflect_mask_helper_1(Type mask, Type shift, Type value)
{ return ((value & mask) >> shift) | ((value << shift) & mask); }

template <typename Type, unsigned Index>
struct reflect_mask_helper_0
{
[[gnu::always_inline]]
static inline constexpr Type invoke(Type value)
{
return reflect_mask_helper_0<Type, Index - 1>::call(
reflect_mask_helper_1<Type>(
static_cast<Type>(select(Index - 1,
0xCan a Recursive Function Be InlineCan a Recursive Function Be Inline, 0xcccccccccccccccc, 0xf0f0f0f0f0f0f0f0,
0xff00ff00ff00ff00, 0xffff0000ffff0000, 0xffffffff00000000)),
1 << (Index - 1),
value));
}
};

template <typename Type>
struct reflect_mask_helper_0<Type, 0>
{
[[gnu::always_inline]]
static inline constexpr Type invoke(Type value) { return value; }
};

template <typename Type>
static inline constexpr Type reflect_mask(Type value)
{ return reflect_mask_helper_0<Type, __builtin_ctz(sizeof(Type) * CHAR_BIT)>::invoke(value); }

int main(void) {
for (int i = 0; i < 65536; i++) {
std::cout << reflect_mask<uint16_t>(i) << std::endl;
}
}

Inline recursive functions

As @HolyBlackCat mentioned - inline is only a hint for a compiler. Compiler ignores it in a lot of cases.

As for your test - your recursive code causes a crash because of stack overflow error - every function call reserves a memory block on the stack and your program runs out of memory.

Is there any way to inline a recursive function?

No, but you can use better functions. I'm not talking about V.map (+64), which would make things certainly a lot faster, but about nTimes. We have three candidates that already do what nTimes does:

{-# INLINE nTimesFoldr #-}
nTimesFoldr :: Int -> (a -> a) -> a -> a
nTimesFoldr n f x = foldr (.) id (replicate n f) $ x

{-# INLINE nTimesIterate #-}
nTimesIterate :: Int -> (a -> a) -> a -> a
nTimesIterate n f x = iterate f x !! n

{-# INLINE nTimesTail #-}
nTimesTail :: Int -> (a -> a) -> a -> a
nTimesTail n f = go n
where
{-# INLINE go #-}
go n x | n <= 0 = x
go n x = go (n - 1) (f x)

All versions take around 8 seconds, compared to the 40 seconds your versions take. Joachim's version also takes 8 seconds, by the way. Note that the iterate version takes more memory on my system. While there is an unroll plugin for GHC, it hasn't been updated in the last five years (it uses custom ANNotations).

No unroll at all?

However, before we despair, how well does GHC actually try to inline everything? Let's use nTimesTail and nTimes 1:

module Main where
import qualified Data.Vector.Unboxed as V

{-# INLINE incAll #-}
incAll :: V.Vector Int -> V.Vector Int
incAll = V.map (+ 1)

{-# INLINE nTimes #-}
nTimes :: Int -> (a -> a) -> a -> a
nTimes n f = go n
where
{-# INLINE go #-}
go n x | n <= 0 = x
go n x = go (n - 1) (f x)

main :: IO ()
main = do
let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
print $ V.sum (nTimes 1 incAll array)
$ stack ghc --package vector -- -O2 -ddump-simpl -dsuppress-all SO.hs
main2 =
case (runSTRep main3) `cast` ...
of _ { Vector ww1_s9vw ww2_s9vx ww3_s9vy ->
case $wgo 1 ww1_s9vw ww2_s9vx ww3_s9vy
of _ { (# ww5_s9w3, ww6_s9w4, ww7_s9w5 #) ->

We can stop right there. $wgo is the go defined above. Even with 1 GHC does not unroll the loop. It's disturbing since 1 is a constant.

Templates to the rescue

But alas, it's not all for naught. If C++ programmers are able to do the following for compile time constants, so should we, right?

template <int N>
struct Call{
template <class F, class T>
static T call(F f, T && t){
return f(Call<N-1>::call(f,std::forward<T>(t)));
}
};
template <>
struct Call<0>{
template <class F, class T>
static T call(F f, T && t){
return t;
}
};

And sure enough, we can, with TemplateHaskell*:

-- Times.sh
{-# LANGUAGE TemplateHaskell #-}
module Times where

import Control.Monad (when)
import Language.Haskell.TH

nTimesTH :: Int -> Q Exp
nTimesTH n = do
f <- newName "f"
x <- newName "x"

when (n <= 0) (reportWarning "nTimesTH: argument non-positive")

let go k | k <= 0 = VarE x
go k = AppE (VarE f) (go (k - 1))
return $ LamE [VarP f,VarP x] (go n)

What does nTimesTH do? It creates a new function where the first name f is going to be applied to the second name x for a total of n times. n now needs to be a compile-time constant, which suits us, since loop-unrolling is only possible with compile-time constants:

$(nTimesTH 0) = \f x -> x
$(nTimesTH 1) = \f x -> f x
$(nTimesTH 2) = \f x -> f (f x)
$(nTimesTH 3) = \f x -> f (f (f x))
...

Does it work? And is it fast? How fast compared to nTimes? Let's try another main for that:

-- SO.hs
{-# LANGUAGE TemplateHaskell #-}
module Main where
import Times
import qualified Data.Vector.Unboxed as V

{-# INLINE incAll #-}
incAll :: V.Vector Int -> V.Vector Int
incAll = V.map (+ 1)

{-# INLINE nTimes #-}
nTimes :: Int -> (a -> a) -> a -> a
nTimes n f = go n
where
{-# INLINE go #-}
go n x | n <= 0 = x
go n x = go (n - 1) (f x)

main :: IO ()
main = do
let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
let vTH = V.sum ($(nTimesTH 64) incAll array)
let vNorm = V.sum (nTimes 64 incAll array)
print $ vTH == vNorm
stack ghc --package vector -- -O2 SO.hs && SO.exe +RTS -t
True
<<ghc: 52000056768 bytes, 66 GCs, 400034700/800026736 avg/max bytes residency (2 samples), 1527M in use, 0.000 INIT (0.000 elapsed), 8.875 MUT (9.119 elapsed), 0.000 GC (0.094 elapsed) :ghc>>

It yields the correct result. How fast is it? Let's use another main again:

main :: IO ()
main = do
let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
print $ V.sum ($(nTimesTH 64) incAll array)
     800,048,112 bytes allocated in the heap                                         
4,352 bytes copied during GC
42,664 bytes maximum residency (1 sample(s))
18,776 bytes maximum slop
764 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed) Avg pause Max pause
Gen 0 1 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.049s 0.0488s 0.0488s

INIT time 0.000s ( 0.000s elapsed)
MUT time 0.172s ( 0.221s elapsed)
GC time 0.000s ( 0.049s elapsed)
EXIT time 0.000s ( 0.049s elapsed)
Total time 0.188s ( 0.319s elapsed)

%GC time 0.0% (15.3% elapsed)

Alloc rate 4,654,825,378 bytes per MUT second

Productivity 100.0% of total user, 58.7% of total elapsed

Well, compare that to the 8s. So for a TL;DR: if you have compile-time constants, and you want to create and/or modify your code based on that constants, consider Template Haskell.

* Please note that this is my first Template Haskell code I've ever written. Use with care. Don't use too large n, or you might end up with a messed up function.



Related Topics



Leave a reply



Submit