Does Ruby Perform Tail Call Optimization

Does Ruby perform Tail Call Optimization?

No, Ruby doesn't perform TCO. However, it also doesn't not perform TCO.

The Ruby Language Specification doesn't say anything about TCO. It doesn't say you have to do it, but it also doesn't say you can't do it. You just can't rely on it.

This is unlike Scheme, where the Language Specification requires that all Implementations must perform TCO. But it is also unlike Python, where Guido van Rossum has made it very clear on multiple occasions (the last time just a couple of days ago) that Python Implementations should not perform TCO.

Yukihiro Matsumoto is sympathetic to TCO, he just doesn't want to force all Implementations to support it. Unfortunately, this means that you cannot rely on TCO, or if you do, your code will no longer be portable to other Ruby Implementations.

So, some Ruby Implementations perform TCO, but most don't. YARV, for example, supports TCO, although (for the moment) you have to explicitly uncomment a line in the source code and recompile the VM, to activate TCO – in future versions it is going to be on by default, after the implementation proves stable. The Parrot Virtual Machine supports TCO natively, therefore Cardinal could quite easily support it, too. The CLR has some support for TCO, which means that IronRuby and Ruby.NET could probably do it. Rubinius could probably do it, too.

But JRuby and XRuby don't support TCO, and they probably won't, unless the JVM itself gains support for TCO. The problem is this: if you want to have a fast implementation, and fast and seamless integration with Java, then you should be stack-compatible with Java and use the JVM's stack as much as possible. You can quite easily implement TCO with trampolines or explicit continuation-passing style, but then you are no longer using the JVM stack, which means that everytime you want to call into Java or call from Java into Ruby, you have to perform some kind of conversion, which is slow. So, XRuby and JRuby chose to go with speed and Java integration over TCO and continuations (which basically have the same problem).

This applies to all implementations of Ruby that want to tightly integrate with some host platform that doesn't support TCO natively. For example, I guess MacRuby is going to have the same problem.

Is Tail Call optimization all that explains this performance difference

Tail recursion is not the difference here. In fact, Ruby does not do anything to optimize tail calls.

The difference is that the naive algorithm recursively calls itself twice each time it's called, giving O(2ⁿ) performance, which means the runtime goes up exponentially as N increases. The tail-call version runs in linear time.

What's the right way to implement very deep recursion in Ruby in clean way?

I came up with this solution. It is still far from perfect, but seem to be good enough in the absence of better ideas. It basically splits the function in the point of recursive call and postpones any calculations that need to be done after with blocks:

def rcall(*args, &block)
    cs = [nil] #Call Stack
    rec = false
    rcaller = proc do |*pargs, &pblock|
        # Enqueue and return control to rcall
        rec = true # We *are* doing rcall
        cs << pblock
        pargs
    end
    result = args
    until cs.empty?
        rec = false

        result = block.call(rcaller, *result)
        while (!rec) && (!cs.empty?)
            # we got result! Return it to past preproc call and work it :3
            lastblock = cs.pop
            result = lastblock.call(*result) if !lastblock.nil?
        end
    end
    return result
end

Usage:

puts (rcall 100 do |rcaller, i|
    if i==1
        1
    else
        rcaller.(i-1) {|i2|
            i2+i
        }
    end
end)
# ==> 5050

This is even slower than reimplementation of the call stack, but looks much cleaner. And if no post-rcall calculations are required, it looks even better, similar to that of a simple tail-call -

puts (rcall(100, []) do |rcaller, i, acc|
    if i==1
        [1, *acc]
    else
        rcaller.(i-1, [i, *acc])
    end
end).join', '
#==> 1, 2, 3..., 99, 100

Does Dart feature Tail Call Optimization (TCO)?

Dart does not support tail-call optimization. There are no current plans to add it.

The primary reason is that it's a feature that you need to rely on in order to use, otherwise you get hugely inefficient code that might overflow the stack, and since JavaScript currently does not support tail call optimization, the feature cannot be efficiently compiled to JavaScript.

Recursion and Tail Call Optimization Example

sum([head | tail]) is head + sum(tail), so sum([12,3,4]) is 12 + sum([3,4]), sum([3,4]) is 3 + sum([4]) and sum([4]) is 4 + sum([]). sum([]) is 0, so in total we get:

sum([12,3,4]) = 12 + sum([3,4])
              = 12 + 3 + sum([4])
              = 12 + 3 + 4 + sum([])
              = 12 + 3 + 4 + 0
              = 19

The recursive call to sum isn't a tail-call, so no tail-call optimization will happen here. To make it TCO, one needs to make a recursive call to sum to be the last one.

defmodule ListHelper do
  def sum([], acc), do: acc
  def sum([head | tail], acc),
    do: sum(tail, acc + head)
end

ListHelper.sum([12,3,4], 0)
#⇒ 19

Does tail-call optimization apply to calls other than recursive calls?

That's going to be implementation dependent, and compiler dependent - but the fact is, it could be used for any tail call, not only recursive one.

Inlining a method can be easily done for any recursive call, even if it's not the method itself.

A special benefit for it would be for mutual recursive calls:

f(n):
   //some code
   g(n)
g(n):
   //some more code
   f(n-1)

The question is "what and how to optimize", should we just "cancel" g, and make f recursive?

Fortunately, this problem is relatively simple and is solveable by simple graph algorithms, thanks to the fact that if each method is a node - it has at most one outgoing edge (the single tail call). This means that the graph is actually a series of chains, that form a simple cycle (single cycle in each connected component) at the "worst case", which is easy to handle.

Does Frege perform tail call optimization?

Frege does Tail Recursion Optimization by simply generating while loops.

General tail calls are handled "by the way" through laziness. If the compiler sees a tail call to a suspectible function that is known to be (indirectly) recursive, a lazy result (a thunk) is returned. Thus, the real burden of calling that function lies with the caller. This way, stacks whose depth depends on the data are avoided.

That being said, already the static stack depth is by nature deeper in a functional language than in Java. Hence, some programs will need to be given a bigger stack (i.e. with -Xss1m).

There are pathological cases, where big thunks are build and when they are evaluated, a stack overflow will happen. A notorious example is the foldl function (same problem as in Haskell). Hence, the standard left fold in Frege is fold, which is tail recursive and strict in the accumulator and thus works in constant stack space (like Haskells foldl').

The following program should not stack overflow but print "false" after 2 or 3s:

module Test
    -- inline (odd) 
  where

even 0 = true
even 1 = false
even n = odd (pred n)

odd n = even (pred n)

main args =  println (even 123_456_789)

This works as follows: println must have a value to print, so tries to evaluate (even n). But all it gets is a thunk to (odd (pred n)). Hence it tries to evaluate this thunk, which gets another thunk to (even (pred (pred n))). even must evaluate (pred (pred n)) to see if the argument was 0 or 1, before returning another thunk (odd (pred (n-2)) where n-2 is already evaluated.
This way, all the calling (at JVM level) is done from within println. At no time does even actually invoke odd, or vice versa.

If one uncomments the inline directive, one gets a tail recursive version of even, and the result is obtained ten times faster.

Needless to say, this clumsy algorithm is only for demonstration - normally one would check for even-ness with a bit operation.

Here is another version, that is pathological and will stack overflow:

even 0 = true
even 1 = false
even n = not . odd  $ n
odd    = even . pred

The problem is here that not is the tail call and it is strict in its argument (i.e., to negate something, you must first have that something). Hence, When even n is computed, then not must fully evaluate odd n which, in turn, must fully evaluate even (pred n) and thus it will take 2*n stack frames.

Unfortunately, this is not going to change, even if the JVM should have proper tail call one day. The reason is the recursion in the argument of a strict function.

Does Ruby Perform Tail Call Optimization