Why does compiler optimization not generate a loop for sum of integers from 1..N?

Question

In order to understand compilers and in particular assembly language better, I have been experimenting with a trivial piece of code where the sum of the first N numbers is calculated, which should result in N(N+1)/2 or N(N-1)/2.

As the code shows there are two functions:

#include <cstdint>


// Once compiled with optimization, the generated assembly has a loop

uint64_t sum1( uint64_t n ) {  
    uint64_t sum = 0;
    for ( uint64_t j=0; j<=n; ++j ) {
        sum += j;
    }
    return sum;
}

// Once compiled with optimization, the generated assembly of the following has no loop

uint64_t sum2( uint64_t n ) {  
    uint64_t sum = 0;
    for ( uint64_t j=0; j<n; ++j ) {
        sum += j;
    }
    return sum;
}

In the first function I loop from O to N i.e. j<=n and in the second function I go from O to N-1 i.e. j<n.

My understanding/observation:

For the first function sum1 the generated assembly has a loop while for the second function sum2 the assembly shows no loop. However, once I remove the compiler optimizations i.e. -O3, then you can finally see the loop for the second function in assembly.
To see the generated assembly with compiler optimization, please see this Optimized.
To see the generated assembly without compiler optimization, please see this non-optimized.
Compiler is x86-64 clang

Question: Why does the compiler optimization not show the other loop in the assembly?

The compiler seems to be observing that sum1 could loop forever, for a certain n. — Drew Dormann
– Drew Dormann, Commented Jan 9, 2022 at 0:39
The compiler’s (ideal) goal is to create the most efficient code, which has the same observable behavior as the source code (when optimizing). Add an output of the temp in the loop to force the compiler to keep the loop. — 500 - Internal Server Error
– 500 - Internal Server Error, Commented Jan 9, 2022 at 0:41
@SamVarshavchik: As user17732522 notes, the compiler may assume the loop terminates; this is an axiom granted by the C or C++ standard. This is logically equivalent to assuming n != UINT64_MAX, as those are all the cases and the only cases for which the loop terminates. And so it is equivalent to saying that if the program executes with n equal to UINT64_MAX, the behavior is not defined by the C or C++ standard. It is not defined to be an infinite loop. — Eric Postpischil
– Eric Postpischil, Commented Jan 9, 2022 at 1:01
@IgorTandetnik: That is only for C++. In C, it does not apply to all loops; an iteration statement with a controlling expression that is a constant expression (including an absent expression, which is implicitly non-zero) may continue infinitely with no side effects. And hence you could write OP’s loop and get defined infinite-loop behavior instead of undefined behavior by moving the j <= n from the controlling expression to a conditional break;. — Eric Postpischil
– Eric Postpischil, Commented Jan 9, 2022 at 1:04

Sam Varshavchik · Accepted Answer · 2022-01-09 00:42:29Z

15

This is because your compiler is very, very smart, and it knows that the sum of all values from 0 to n can be calculated with a trivial mathematical formula, instead of a loop.

However, your C++ compiler also figured out that this mathematical formula cannot be used in the <= version because for certain input values a bug gets triggered that results in an infinite loop, so all bets are off, and the compiler compiles the code exactly as given.

answered Jan 9, 2022 at 0:42

Sam Varshavchik

119k6 gold badges109 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

20 Comments

Sam Varshavchik Over a year ago

It is absolutely true that when debugging code you do not want these kinds of optimizations taking place. Which is precisely why optimizations require a configuration setting to enable (or disable).

user17732522 Over a year ago

If you make the infinite loop more obvious, clang will recognize it and remove it, even if the value calculated in the sum is used. It even produces an empty function without instructions, which would result in non-sense programs: godbolt.org/z/cv6ndjYhT/z/Pff5rf1eh

Peter Cordes Over a year ago

@Jellyboy: It's likely that the pattern-recognizer for sum-of-linear-sequence loops is looking for specific things in the loop, which presumably includes the loop being provably finite. The fact that this part of clang/LLVM fails to use value-range information from __builtin_assume could just as easily be a missed optimization. Although interestingly, if you do n &= 0xfff instead of assume, the optimizer is able to use Gauss's formula: godbolt.org/z/Mn16PYvqo (With a little less effort to avoid overflow, but still more than needed for the tiny n.)

Peter Cordes Over a year ago

@Jellyboy: It's the obvious and by far the most likely explanation. Some experiments on both versions are consistent with that theory, e.g. j += 3 makes the j<n version not optimize. And interestingly, using __builtin_assume( n<1000 ) defeats optimization that we otherwise get with n&=0xffff. godbolt.org/z/5j3nhdnvT. Fully empty loop removal always happens for clang regardless of __builtin_assume. (GCC doesn't do that, although an ASSUME using if(!x)__builtin_unreachable() does let GCC remove potentially-infinite empty loops, too.) godbolt.org/z/4TKfxq4rn

Peter Cordes Over a year ago

Yes, this is SO, we should expect good speculation, ideally backed up by sufficient evidence to go with that theory. Also useful practical advice, which this is. It's unreasonable to expect an answerer to dig into the LLVM source code and find the exact pattern-recognizer algorithm to rule out any other possibility. If someone knows of a more accurate explanation (or worse, if an explanation is totally wrong, which is not rare on performance question based on how CPUs/caches work), they can comment. But lack of definitive proof isn't IMO a problem, at least here.

|

Collectives™ on Stack Overflow

Why does compiler optimization not generate a loop for sum of integers from 1..N?

1 Answer 1

20 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

20 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related