What an innocent looking short but fantastically awkward test case for compilers!
I have never seen the halting problem directly affect compilation before but GCC in trying to honour the recursive specification of this code suffers an exponential increase of compile time and memory usage with larger values of fixed N a known constant at compile time when computing fibo(N).
The safe limits on N where GCC can fully resolve fibo(N) to a compile time constant are N <= 35 for call by value and N <= 28 for call by reference. I have no idea why the upper limits are different just that they are. The C code is deceptively similar but the assembler generated is radically different!
I also tried it on MSVC C/C++ 17 and Intel ICX 2024.1 but only got quite normal behaviour and fairly similar performance for both. Intel ICX 2025 appears to cache values it has previously computed iff you get the gofaster compiler options just right. I'm unable to reproduce that transient observation at present. "-O3 -Wall" isn't enough.
I took a look at this code in detail because the recursive routine doesn't actually alter the value of number so I wasn't expecting to see much difference between call by value and call by reference.
Using GCC for small N <= 28 known constant at compile time that is true - it compiles to load magic constant fibo(N) with exactly the same generated code for both and a very long compile time (that grows exponentially with N).
I created a canonical simplest possible toy version on [Godbolt](https://godbolt.org/z/h6G1z7WPh). Please be kind to the site and don't feed in big N! Godbolt can just about survive N=36 on a good day if the wind is blowing in the right direction. Expect to get process killed if you go higher.
You can try the various combos by commenting lines in or out N=28,29 (c-b-r) and 35,36 (c-b-v) are the most interesting fixed value cases along with N unknown at compile time.
The huge difference in speed has almost nothing to do with call-by-value vs call-by-reference and everything to do with extremely sophisticated code generation when the c-b-r path is taken.
It pretty much puts everything into registers and uses much bigger steps decrementing N by 5 for each recursion level and then terminating in a lookup table when N<= 5.
The first case with call by value is nothing out of the ordinary. Except that for small values of N<35 where it compiles very slowly to load the constant value fibo(N). The general optimisation consists of unrolling the recursion one level and writing its own code out twice (approx. 300 bytes).
The second case code generation is extraordinary at least on GCC -O3 (I will try harder to provoke the same behaviour on Intel & GCC). It is using tricks that suggest to me that it has been very carefully optimised to perform incredibly well on this particular type of benchmark.
There are three scenarios I was able to obtain but I will restrict this discussion to the two main ones. For N<29 value known at compile time it takes ages to compile to load constant value fibo(N). Larger constant values will kill the compiler (or cause the OS to do so) due to exponential memory usage or timeouts (on Godbolt). To hide the value from the compiler I put it in a string which is good enough to make it generate generic code for any N (approx. 160 bytes).
I have examined the assembler code generated by the GCC compiler for the two sample routines and then reconstructed functionally equivalent C code below to help explain more easily how the call by reference code obtains its apparently magical speed.
It has used a tail recursion trick to minimise the recursion depth. It takes bites of 5 off the value of N for each recursive call and then has a lookup table for N=1 through 5! It is a quite remarkable code transformation away from the original sourcecode.
Here is the code. fibo_cbv is the relatively normal inlined code generation so typical of modern compilers but makes heavy use of the stack. fibo_cbr is the unusual code generated by GCC for this particular benchmark function. The latter is really rather cunning! My C code isn't quite exactly equivalent to the compiled assembler as I needed to tweak it a bit to get it working correctly.
BTW Recursion is a terrible way to compute these values!
#include <cmath>
#include <string>
#include <stdio.h>
// N : 1 2 3 4 5 6 7 8 9 10 11
// fibo(N) : 1 1 2 3 5 8 13 21 34 55 89
constexpr auto fibo(unsigned int number) -> unsigned long long {
//constexpr auto fibo(unsigned int const& number) -> unsigned long long {
if (number < 2) return number;
return fibo(number - 1) + fibo(number - 2);
}
constexpr auto fibo_cbv(unsigned int number) -> unsigned long long {
unsigned int result = 0;
if (number < 2) return number;
if (--number < 2) return number; else result = fibo_cbv(number);
if (--number < 2) return result + number;
return result + fibo_cbv(number);
}
constexpr auto fibo_cbr(unsigned int const& number) -> unsigned long long {
switch (number) {
case 1: return 1;
case 2: return 1;
case 3: return 2;
case 4: return 3;
case 5: return 5;
case 6: return 8;
case 7: return 13;
default:
// return fibo_cbr(number - 1) + fibo_cbr(number - 2);// works much slower
unsigned long long a = fibo_cbr(number - 6);
unsigned long long b = fibo_cbr(number - 5);
return 5 * a + 8 * b;
}
}
int main()
{
std::string input{ "28" };
unsigned int n;
for (n = 1; n < 36; n++)
{
printf("\nfibo_ref(%i) = %i\n", n, fibo(n));
printf("fibo_cbv(%i) = %i\n", n, fibo_cbv(n));
printf("fibo_cbr(%i) = %i\n", n, fibo_cbr(n));
}
// return fibo(36); // fixed value to test
n = stol(input);
return fibo(n);
}
I also noticed whilst experimenting that if I read in a variable N the call by reference code from GCC became even more complicated using a powers of 2 strategy instead to take chunks of 8 or 4 off the initial value of N. I'm not sure which aspect of the test code took it down that path. I can't reproduce it again right now but I am sure I didn't hallucinate that version of the generated code.
Worth noting also that N=92 for fibo(92) is as high as you can go without overflowing for uint64_t.
I haven't been able to get any other C/C++ compiler I tried apart from GCC to do the extensive compile time computations needed to compute fibo(N) to a static constant the slow recursive way.
The short answer to why it is faster for call by reference is that it is using a radically different algorithm that looks nothing like the original C source but is functionally equivalent.
Nothing at all to do with the call by reference vs call by value mechanism.
static_assert(1'836'311'903 == fibo(46), "fibo(46)");in another test case that triggered the RAM issue. I apologize for the confusion. But my main question remains why the reference is so much faster than the value parameter in the benchmarking code.