Why Doesn't .Net/C# Optimize for Tail-Call Recursion

Why doesn't .NET/C# optimize for tail-call recursion?

JIT compilation is a tricky balancing act between not spending too much time doing the compilation phase (thus slowing down short lived applications considerably) vs. not doing enough analysis to keep the application competitive in the long term with a standard ahead-of-time compilation.

Interestingly the NGen compilation steps are not targeted to being more aggressive in their optimizations. I suspect this is because they simply don't want to have bugs where the behaviour is dependent on whether the JIT or NGen was responsible for the machine code.

The CLR itself does support tail call optimization, but the language-specific compiler must know how to generate the relevant opcode and the JIT must be willing to respect it.
F#'s fsc will generate the relevant opcodes (though for a simple recursion it may just convert the whole thing into a while loop directly). C#'s csc does not.

See this blog post for some details (quite possibly now out of date given recent JIT changes). Note that the CLR changes for 4.0 the x86, x64 and ia64 will respect it.

C# compilation with tail recursive optimization?

There are different levels at which the tail call optimization can be supported. The JIT is really responsible for most of the optimizations in any .NET program. The C# compiler itself doesn't even do method inlining, that is the JIT compiler's responsibility. The C# compiler could use the Tailcall IL opcode designating a call as a tail call, however I believe no version of C# compiler does this. The JIT compiler is permitted to make tail call optimizations whenever it sees fit. In particular, I believe only the 64-bit JIT does this. This blog post outlines a number of scenarios in which JIT64 cannot use tail call optimization. I'm sure the criteria may be subject to change since they are working on a rewrite of the JIT compiler, codenamed RyuJIT.

If you want a short example of a program that can use TCO try this:

class Program
{
    static void Main(string[] args)
    {
        Test(1);
    }

    private static void Test(int i)
    {
        Console.WriteLine(i);
        Test(i + 1);
    }
}

Set the project to build Release/x64 (or AnyCPU w/o prefer 32-bit) and start without the debugger attached. The program will run forever. If I do not do all of those things, then I get a stackoverflow exception around 20947.

Logic of implementing Recursion and Tail Recursion

Recursion is a function calling itself. Tail recursion is a function calling itself in such a way that the call to itself is the last operation it performs. This is significant because a compiler that supports tail recursion can either eliminate the stack frame before the recursive call, or even transform it into a loop. In either case the accumulation of stack frames due to the repeated calls is prevented, which eliminates the possibility of overflowing the stack.

That said, your recursive sum() function works but is far from efficient: it creates new arrays for every step. There is a far easier way to calculate the sum recursively:

static int sum(int[] array)
{
    return sum(array, array.Length - 1);
}

static int sum(int[] array, int index)
{
    return array[index] + (index == 0 ? 0 :
        sum(array, index - 1));
}

The first sum() function calls the helper function with appropriate parameters, and the helper function only needs to adjust the index provided. This is done here using a ternary expression for brevity: it keeps the function as essentially a one-liner, and it illustrates clearly that the recursive call is not the last operation before returning (the addition is).

To transform this calculation into a tail-recursive one, we need to add a parameter for the intermediate result:

static int sum(int[] array)
{
    return sum(array, array.Length - 1, 0);
}

static int sum(int[] array, int index, int res)
{
    return index < 0 ? res :
        sum(array, index - 1, res + array[index]);
}

Here the addition was moved to occur before the recursive call, and the call is clearly the last operation, making the function tail-recursive.

GCC doesn't tail call optimize recursive function

As it has been pointed out by @BillLynch in the comments - it's clang which doesn't optimize the function for some reason, not GCC. On my PC GCC 10.2.0 optimizes the function properly, so there's no problem here.

Why does the F# compiler not create a tail call for this function?

In fact, you have already answered your own question. Quoting the comment from the source code,

// Throw away the unit result

is the pending operation after the call, preventing the compiler from using a tail call here.

There's a great blog post by Keith Battocchi, "Tail calls in F#" (scroll to section "Limitations / Calling function values returning unit") which discovers a lot of details.

In two words:

Normally, F# functions … -> unit are compiled to .NET methods returning void.

However, functions treated as values (e.g., those passed as arguments to higher-order functions) are stored in objects of type ('a->'b) so they actually return Microsoft.FSharp.Core.Unit, not void.

The compiler needs to pop the dummy unit value off of the stack before returning.

Therefore, there is a pending operation after the recursive call, and therefore the compiler can't optimize it to a tail call.

Good news:

Note that this issue only arises when using first-class functions as values. Calling normal .NET methods which return void doesn’t present this problem, because there is no return value to pop off of the stack.

Is this tail recursive and could it cause a stack overflow? C# .Net

As usr explains in his answer, ReSharper tries to find known patterns in your code it can offer refactorings for.

But let's take a look at the generated code for both cases.

The first function:

private static string GetLineA()
{
    string s = Console.ReadLine();

    if (s == null)
    {
        return GetLineA();
    }
    return s;
}

Gives this (x64, Release):

00007FFB34AC43EE  add         byte ptr [rax],al  
00007FFB34AC43F0  sub         rsp,28h  
00007FFB34AC43F4  call        00007FFB8E56F530  // <-- Console.ReadLine
00007FFB34AC43F9  test        rax,rax  
00007FFB34AC43FC  jne         00007FFB34AC440F  
00007FFB34AC43FE  mov         rax,7FFB34AC0F60h  
00007FFB34AC4408  add         rsp,28h  
00007FFB34AC440C  jmp         rax  
00007FFB34AC440F  add         rsp,28h  
00007FFB34AC4413  ret

You can clearly see it's tail recursive, since the only call instruction is for Console.ReadLine.

The second version:

private static string GetLineB()
{
    string s = Console.ReadLine();

    if (s == null)
    {
        s = GetLineB();
    }
    return s;
}

Gives this:

00007FFB34AC44CE  add         byte ptr [rax],al  
00007FFB34AC44D0  sub         rsp,28h  
00007FFB34AC44D4  call        00007FFB8E56F530  // <-- Console.ReadLine
00007FFB34AC44D9  test        rax,rax  
00007FFB34AC44DC  jne         00007FFB34AC44E3  
00007FFB34AC44DE  call        00007FFB34AC0F68 // <-- Not good.
00007FFB34AC44E3  nop  
00007FFB34AC44E4  add         rsp,28h  
00007FFB34AC44E8  ret

There's a second call there, so you don't get tail recursion, and the stack will grow, eventaully leading to a stack overflow if it grows large enough.

Well, it looks like the JIT didn't optimize the code into a tail recursive call.

Anyway, beware, since you're at the mercy of the JIT.

Here's GetLineA in x86:

00F32DCA  in          al,dx  
00F32DCB  call        72A209DC  // <-- Console.ReadLine
00F32DD0  test        eax,eax  
00F32DD2  jne         00F32DDC  
00F32DD4  call        dword ptr ds:[12B8E94h]  // <-- Ouch
00F32DDA  pop         ebp  
00F32DDB  ret  
00F32DDC  pop         ebp  
00F32DDD  ret

See? You can't really depend on it, and the language provides no guarantee.

Generate tail call opcode

C# compiler does not give you any guarantees about tail-call optimizations because C# programs usually use loops and so they do not rely on the tail-call optimizations. So, in C#, this is simply a JIT optimization that may or may not happen (and you cannot rely on it).

F# compiler is designed to handle functional code that uses recursion and so it does give you certain guarantees about tail-calls. This is done in two ways:

if you write a recursive function that calls itself (like your fib) the compiler turns it into a function that uses loop in the body (this is a simple optimization and the produced code is faster than using a tail-call)
if you use a recursive call in a more complex position (when using continuation passing style where function is passed as an argument), then the compiler generates a tail-call instruction that tells the JIT that it must use a tail call.

As an example of the second case, compile the following simple F# function (F# does not do this in Debug mode to simplify debugging, so you may need Release mode or add --tailcalls+):

let foo a cont = cont (a + 1)

The function simply calls the function cont with the first argument incremented by one. In continuation passing style, you have a long sequence of such calls, so the optimization is crucial (you simply cannot use this style without some handling of tail calls). The generates IL code looks like this:

IL_0000: ldarg.1
IL_0001: ldarg.0
IL_0002: ldc.i4.1
IL_0003: add
IL_0004: tail.                          // Here is the 'tail' opcode!
IL_0006: callvirt instance !1 
  class [FSharp.Core] Microsoft.FSharp.Core.FSharpFunc`2<int32, !!a>::Invoke(!0)
IL_000b: ret

Why should iteration be used instead of tail recursion?

Because people mistakenly care more about micro-optimizations than clear and readable code.

Why does tail call optimization need an op code?

Following the links you already provided, this is the part which seems to me, answers your question pretty closely..

Source

The CLR and tail calls
When you're dealing with languages managed by the CLR, there are two kinds of compilers in play. There's the compiler that goes from your language's source code down to IL (C# developers know this as csc.exe), and then there's the compiler that goes from IL to native code (the JIT 32/64 bit compilers that are invoked at run time or NGEN time). Both the source->IL and IL->native compilers understand the tail call optimization. But the IL->native compiler--which I'll just refer to as JIT--has the final say on whether the tail call optimization will ultimately be used. The source->IL compiler can help to generate IL that is conducive to making tail calls, including the use of the "tail." IL prefix (more on that later). In this way, the source->IL compiler can structure the IL it generates to persuade the JIT into making a tail call. But the JIT always has the option to do whatever it wants.
When does the JIT make tail calls?
I asked Fei Chen and Grant Richins, neighbors down the hall from me who happen to work on the JIT, under what conditions the various JITs will employ the tail call optimization. The full answer is rather detailed. The quick summary is that the JITs try to use the tail call optimization whenever they can, but there are lots of reasons why the tail call optimization can't be used. Some reasons why tail calling is a non-option:
Caller doesn't return immediately after the call (duh :-))
Stack arguments between caller and callee are incompatible in a way that would require shifting things around in the caller's frame before the callee could execute
Caller and callee return different types
We inline the call instead (inlining is way better than tail calling, and opens the door to many more optimizations)
Security gets in the way
The debugger / profiler turned off JIT optimizations

The most interesting part in context of your question, which makes it super clear in my opinion, among many scenarios, is example of security mentioned above...

Security in .NET in many cases depends on the stack being accurate... at runtime.. Which is why, as highlighted above, the burden is shared by both the source to CIL compiler, and (runtime) CIL-to-native JIT compilers, with the final say being with the latter.

Why Doesn't .Net/C# Optimize for Tail-Call Recursion