Which, If Any, C++ Compilers Do Tail-Recursion Optimization

Which, if any, C++ compilers do tail-recursion optimization?

All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:

int bar(int, int);

int foo(int n, int acc) {
    return (n == 0) ? acc : bar(n - 1, acc + 2);
}

int bar(int n, int acc) {
    return (n == 0) ? acc : foo(n - 1, acc + 1);
}

Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:

For MSVC, use /O2 or /Ox.
For GCC, Clang and ICC, use -O3

An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.

As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.

C tail call optimization

Statements like "C doesn't perform tail call elimination" make no sense. As you correctly noted yourself, things like this depend entirely on the implementation. And yes, any decent implementation can easily turn tail-recursion into [an equivalent of] a cycle. Of course, C compilers do not normally give any guarantees about what optimizations will and what optimizations will not happen in each particular piece of code. You have to compile it and see for yourself.

Is Tail recursive really powerful on C language?

The C standard does not specify tail recursion as part of the language at all. The only thing the C99 standard says about recursive function calls is:

6.5.2.2 Function Calls
11 Recursive function calls shall be permitted, both directly and indirectly through any chain
of other functions.

If a compiler can detect tail recursion and translate a recursive call as tail recursion, it can but it is not required to do so by the standard.

What is tail call optimization?

Tail-call optimization is where you are able to avoid allocating a new stack frame for a function because the calling function will simply return the value that it gets from the called function. The most common use is tail-recursion, where a recursive function written to take advantage of tail-call optimization can use constant stack space.

Scheme is one of the few programming languages that guarantee in the spec that any implementation must provide this optimization, so here are two examples of the factorial function in Scheme:

(define (fact x)
  (if (= x 0) 1
      (* x (fact (- x 1)))))

(define (fact x)
  (define (fact-tail x accum)
    (if (= x 0) accum
        (fact-tail (- x 1) (* x accum))))
  (fact-tail x 1))

The first function is not tail recursive because when the recursive call is made, the function needs to keep track of the multiplication it needs to do with the result after the call returns. As such, the stack looks as follows:

(fact 3)
(* 3 (fact 2))
(* 3 (* 2 (fact 1)))
(* 3 (* 2 (* 1 (fact 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6

In contrast, the stack trace for the tail recursive factorial looks as follows:

(fact 3)
(fact-tail 3 1)
(fact-tail 2 3)
(fact-tail 1 6)
(fact-tail 0 6)
6

As you can see, we only need to keep track of the same amount of data for every call to fact-tail because we are simply returning the value we get right through to the top. This means that even if I were to call (fact 1000000), I need only the same amount of space as (fact 3). This is not the case with the non-tail-recursive fact, and as such large values may cause a stack overflow.

Is tail call optimization applicable to this function?

Is tail recursion optimization applicable to this function?

Yes, the tail recursion optimization is applicable in your examples. Please look at assembler https://godbolt.org/g/cSpUZw for the second sample. The more progressive optimization applied. Recursion is replaced with a loop.

bar(int):
  cmp edi, 1
  jg .L12
  jmp .L6
.L15:
  sar edi
  cmp edi, 1
  je .L14
.L12:
  test dil, 1
  jne .L15
  lea edi, [rdi+1+rdi*2]
  jmp foo(int)
.L14:
  mov eax, 1
  ret
.L6:
  mov eax, edi
  ret

Tail recursion in C++

A simple tail recursive function:

unsigned int f( unsigned int a ) {
   if ( a == 0 ) {
      return a;
   }
   return f( a - 1 );   // tail recursion
}

Tail recursion is basically when:

there is only a single recursive call
that call is the last statement in the function

And it's not "better", except in the sense that a good compiler can remove the recursion, transforming it into a loop. This may be faster and will certainly save on stack usage. The GCC compiler can do this optimisation.

Why languages such as C, Pascal cannot implement tail recursion?

The difficulty with tail recursion in these languages is that their implementations use the stack to store procedure arguments and local variables as well as return addresses.

In machine language, there are no function calls, so function calls are translated into

Push the arguments into the call stack
push the return address into the stack
goto the body of the subroutine you want to call
when the subroutine exits, it goes back to the return address we pushed to the stack earlier
arguments get removed from the stack

Now there are two basic variations of this "calling convention" pattern, having to do with who is responsible for step 5 (removing function arguments from the stack). In C, the calling convention is that the function caller is responsible for cleaning the stack. This allows for variadic functions like printf (in these cases only the caller knows the correct number of arguments to pop after the call completes) but means that you can't do tail call optimization since "returning" is not the last thing a function does (you still need to clean the stack afterwards). On the other hand, if your calling convention is to have the function itself clean up the stack then you lose the ability to have variadic functions but can have TCO.

How 'smart' is GCC's Tail-Call-Optimisation?

Just disassemble the code and see what happened. Without optimizations, I get this:

0x0040150B  cmpl   $0x0,0x10(%rbp)
0x0040150F  jle    0x401523 <recursive+35>
0x00401511  mov    0x10(%rbp),%eax
0x00401514  sub    $0x1,%eax
0x00401517  mov    %eax,%ecx
0x00401519  callq  0x401500 <recursive>

But with -O1, -O2 or -O3 I get this:

0x00402D09  mov    $0x2ffff,%edx

This doesn't have anything to do with tail optimizations, but much more radical optimizations. The compiler simply inlined the whole function and pre-calculated the result.

This is likely why you end up with the same result in all your different cases of benchmarking.

How do I tell if a function is tail recursive C?

For a function to be tail recursive, the recursive call must be its last action.
In your code, the last thing the function does is adding two values, so list_size_count() is not tail recursive.

int list_size_count(link_t *first, int acc)
{
  if (first != NULL)
  {
    return (acc + list_size_count((first->next), 1));
  }
  return acc;
}

Here is how to make it tail recursive:

int list_size_count(link_t *first, int acc)
{
  if(first != NULL)
  {
    return list_size_count(first->next, acc + 1);
  }

  return acc;
}

Which, If Any, C++ Compilers Do Tail-Recursion Optimization