Consider a simple function that adds the first N natural numbers. (e.g. sum(5) = 0 + 1 + 2 + 3 + 4 + 5 = 15
).
Here is a simple JavaScript implementation that uses recursion:
function recsum(x) {
if (x === 0) {
return 0;
} else {
return x + recsum(x - 1);
}
}
If you called recsum(5)
, this is what the JavaScript interpreter would evaluate:
recsum(5)
5 + recsum(4)
5 + (4 + recsum(3))
5 + (4 + (3 + recsum(2)))
5 + (4 + (3 + (2 + recsum(1))))
5 + (4 + (3 + (2 + (1 + recsum(0)))))
5 + (4 + (3 + (2 + (1 + 0))))
5 + (4 + (3 + (2 + 1)))
5 + (4 + (3 + 3))
5 + (4 + 6)
5 + 10
15
Note how every recursive call has to complete before the JavaScript interpreter begins to actually do the work of calculating the sum.
Here's a tail-recursive version of the same function:
function tailrecsum(x, running_total = 0) {
if (x === 0) {
return running_total;
} else {
return tailrecsum(x - 1, running_total + x);
}
}
Here's the sequence of events that would occur if you called tailrecsum(5)
, (which would effectively be tailrecsum(5, 0)
, because of the default second argument).
tailrecsum(5, 0)
tailrecsum(4, 5)
tailrecsum(3, 9)
tailrecsum(2, 12)
tailrecsum(1, 14)
tailrecsum(0, 15)
15
In the tail-recursive case, with each evaluation of the recursive call, the running_total
is updated.
Note: The original answer used examples from Python. These have been changed to JavaScript, since Python interpreters don't support tail call optimization. However, while tail call optimization is part of the ECMAScript 2015 spec, most JavaScript interpreters don't support it.
All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:
int bar(int, int);
int foo(int n, int acc) {
return (n == 0) ? acc : bar(n - 1, acc + 2);
}
int bar(int n, int acc) {
return (n == 0) ? acc : foo(n - 1, acc + 1);
}
Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:
- For MSVC, use
/O2
or /Ox
.
- For GCC, Clang and ICC, use
-O3
An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.
As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.
Best Solution
JIT compilation is a tricky balancing act between not spending too much time doing the compilation phase (thus slowing down short lived applications considerably) vs. not doing enough analysis to keep the application competitive in the long term with a standard ahead-of-time compilation.
Interestingly the NGen compilation steps are not targeted to being more aggressive in their optimizations. I suspect this is because they simply don't want to have bugs where the behaviour is dependent on whether the JIT or NGen was responsible for the machine code.
The CLR itself does support tail call optimization, but the language-specific compiler must know how to generate the relevant opcode and the JIT must be willing to respect it. F#'s fsc will generate the relevant opcodes (though for a simple recursion it may just convert the whole thing into a
while
loop directly). C#'s csc does not.See this blog post for some details (quite possibly now out of date given recent JIT changes). Note that the CLR changes for 4.0 the x86, x64 and ia64 will respect it.