3

For performance reasons, are recursions ever replaced with while loops? I know the code looks much uglier, but let's take the following example:

void print_countdown(long long n) {
    if (n!=0) {
        printf("Placeholder for a function\n");
        print_countdown(n-1);
    } else 
        printf("Done!\n");
}

If the number is 1 million, then this will have an overhead of:

  • 1 million copies of the n var (saved from rdi, put back in rdi for the recursive call, if the recursive work includes a function call, otherwise it can stay in rdi.)
  • call func
  • push rbp ... pop function prologue or something else for stack alignment, depending on optimization level and compiler choices.

In other words, while the code is readable, for anything like > 1000 loops, this seems to be better rewritten to something like:

void print_countdown(long long n) {
    while (n < MAX_LOOPS) {
        if (n!=0) {
            printf("Placeholder for a function\n");
            n = n-1;     
        } else
            printf("Done!");
    }
    
}

The assembly code (Godbolt) also looks much, much simpler in the while format -- ~20 lines vs ~40 lines.

Is it common to do this kind of loop-rewriting, and are there ever times in a recursive function call where it cannot be re-written to a loop ?

12
  • 1
    C++ compilers would (usually) do tail recursion optimisation if compiled with -O3 flag Commented Jan 30, 2021 at 6:04
  • 1
    Compile with -O3 and find out. (But basically, replacing call foo / ret with jmp foo is standard tailcall optimization. In a recursive function, this creates a loop that can be optimized further.) Commented Jan 30, 2021 at 6:09
  • 1
    Tail recursion optimisation is an optimisation where the compiler replaces recursion with iteration, if it identifies that the underlying process is in fact iterative ie, uses constant amount of resources and if its 'state' information at any time is captured by a constant number of variables. A recursive process would consume resources depending upon the number of calls made to the (recursively called) function, unlike the iterative process. A function can be called in a recursive manner but it can still be an iterative process, in which case, the compiler can make this optimisation. Commented Jan 30, 2021 at 6:10
  • 2
    @DavidRanieri: No, -O3 is not required, I only said that to chime in with tf3's comment. (I didn't notice the Godbolt link right away because links are less visible on code formatting, which the link text used for no reason.) Commented Jan 30, 2021 at 6:20
  • 2
    Sometimes fixing the question to ask the interesting thing avoids making answers spend time dealing with misconceptions in the question, letting the answer be better. Also better for future readers to not be misled in the first place by some parts of the premise / setup in the question. So yeah, turning questions into something I think it would be useful to have answered is not just for your benefit :P Commented Jan 30, 2021 at 6:31

2 Answers 2

3

Yep.

Proof: https://godbolt.org/z/EqbnfY

This code

#include <stdio.h>

void print_countdown(long long n) {
    if (n!=0) {
        // do_something
        print_countdown(n-1);
    } else 
        printf("Done!\n");
}

Generates this assembly with the x86-64 Clang compiler and -O2 optimizations (the compiler even generates the comments too!)

print_countdown(long long):                   # @print_countdown(long long)
        mov     edi, offset .Lstr
        jmp     puts                            # TAILCALL
.Lstr:
        .asciz  "Done!"

Other compilers generate similar results.

Sign up to request clarification or add additional context in comments.

5 Comments

For some reason, having a non-inline function call like printf as the // do_something defeats GCC, which instead unrolls by 4 but keeps the recursion instead of a loop. Only recursing every 4 puts calls is better than every 1, but still much worse than a loop. Of course, with recursive work that can inline, GCC does much better.
@PeterCordes - did you compile with optimizations? Look here: godbolt.org/z/PEzch7
@selbie. Peter refers to godbolt.org/z/baofGc
I'm talking about the Godbolt link in the question, godbolt.org/z/jqG3Ta - using the updated code block, compiled by GCC10.2 -O3. @DavidRanieri's link (godbolt.org/z/baofGc) is that code with clang, which does optimize away the recursion into a loop as you say. There's no reason GCC can't, it's just a missed optimization.
Note that the asm in your question has not only optimized away the recursion, it's also optimized away the resulting empty loop (of --n down to 0), leaving just one tailcall of puts. (GCC does that, too, when the recursive loop is empty)
3

Yes, tail call elimination is a common optimization. If you haven't seen it yet, check out https://en.wikipedia.org/wiki/Tail_call, which talks about this topic in detail.

The GCC, LLVM/Clang, and Intel compiler suites perform tail call optimization for C and other languages at higher optimization levels or when the -foptimize-sibling-calls option is passed. Though the given language syntax may not explicitly support it, the compiler can make this optimization whenever it can determine that the return types for the caller and callee are equivalent, and that the argument types passed to both function are either the same, or require the same amount of total storage space on the call stack.

The Wiki page also has an assembly example of how an optimizer might revise a recursive routine to a loop.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.