Why this loop is not optimised out?

Question

I've got a very simple c program which copies all elements from array A to back to array A. For example,

double *A;
A = (double*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
  A[i] = A[i];
}

I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?

Many thanks.

FWIW, can't repro on x86_64 with GCC 4.5, that loop is not generated in the assembly with -O3, only the malloc call is left. — Mat
– Mat, Commented Oct 6, 2011 at 21:17
What's the assembly you're getting? It's being optimized out for GCC 4.5.0 for me. — Steve Blackwell
– Steve Blackwell, Commented Oct 6, 2011 at 21:17

MSN · Accepted Answer · 2011-10-06 21:12:40Z

7

From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.

For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.

answered Oct 6, 2011 at 21:12

MSN

54.8k7 gold badges79 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user983027 Over a year ago

Thanks for the reply. However, that happens for integer array too.

Dietrich Epp Over a year ago

As a side note, on some architectures it is a no-op. For example, the PowerPC 601.

Cubbi Over a year ago

Why would a C compiler care, assuming FENV_ACCESS is off?

R.. GitHub STOP HELPING ICE Over a year ago

gcc ignores FENV_ACCESS pragmas so it needs to behave as if it's always on. Unfortunately it only behaves partly correctly, though...

pmg · Accepted Answer · 2011-10-06 21:41:28Z

my gcc (version 4.6.1) optimizes it out

$ cat 7680489.c

#include <stdlib.h>

#define SIZE 100

int main(void) {
  double *a;
  size_t i;

  a = calloc(SIZE, sizeof *a); /* initialize elements */
  for (i = 0; i < SIZE; i++) a[i] = a[i];
  free(a);

  return 0;
}

$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s

        .file   "7680489.c"
        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB3:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $8, %esi
        movl    $100, %edi
        call    calloc
        movq    %rax, %rdi
        call    free
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE3:
        .size   main, .-main
        .ident  "GCC: (Debian 4.6.1-4) 4.6.1"
        .section        .note.GNU-stack,"",@progbits

No loop that I can see. The assembly output is very similar when using malloc rather than calloc. I switched to calloc to avoid having objects with indeterminate values about (thanks R..).

Hot Licks · Accepted Answer · 2011-10-06 22:01:12Z

2

To optimize away the loop the compiler had to recognize several things:

The load/store does not cause a modification of the data (due, eg, to floating point NaN conversions)
The two array addresses are identical
The two array indexing expressions are identical
After taking into account the addresses and indexing, the load & store are not overlapping but coincide exactly.
The store will not "step" on the results of other stores, or on a value that has not yet been loaded.
The load/store cannot result in a storage fault. This in turn requires that the compiler recognize that the storage came from malloc, and that the loop does not index beyond the end of the allocation.
The loop will terminate in a finite number of iterations
And probably several others I'm not thinking of

Keep in mind that optimizations are oriented towards removing "normal" redundancies, not eliminating "classroom examples".

edited Oct 6, 2011 at 22:01

answered Oct 6, 2011 at 21:41

Hot Licks

47.8k19 gold badges96 silver badges156 bronze badges

4 Comments

psusi Over a year ago

The compiler does not have to recognize or predict storage faults. Indexing off the end of the array results in undefined behavior. That could mean a segfault, or it might not.

Hot Licks Over a year ago

Depends on the compiler and what "contact" it adheres to.

psusi Over a year ago

What do you mean "contact"? The standard says the behavior is undefined, so either a segv or no segv are both valid results. Because both results are considered the same, there is no difference between optimizing and not, so the optimization is allowed.

Hot Licks Over a year ago

I misspelled -- meant to say "contract".

Johan · Accepted Answer · 2011-10-06 21:15:59Z

1

The compiler does not do any actual thinking.
It can only optimize out stuff that matches a preconceived pattern.

I.e. if the code does not match a known no-op pattern that has been pre-programmed into the compiler it does not get eliminated.

By putting in the A[i] = A[i] you changed the pattern just enough to not match the empty loop pattern

answered Oct 6, 2011 at 21:15

Johan

77.4k28 gold badges204 silver badges346 bronze badges

2 Comments

Johan Over a year ago

According to the OP the same thing happens to an integer array, so how is this an incorrect answer?

Hot Licks Over a year ago

Actually, compilers mostly optimize by doing a number of small optimizations, vs recognizing "patterns" at a high level. Eg, the compiler would common the array indexing for the two arrays, given that they have the same pointer and array index. Then it would recognize that there was a copy of a value to itself. (But, as I indicated in my post, there are still a number of hurdles before the copy can be eliminated.)

Cratylus · Accepted Answer · 2011-10-06 21:27:49Z

0

The problem here is that you are using pointers.

It is hard for a compiler to optimise pointers since it can assume that pointer can read/write to any location in memory.

Change to [] array operator instead and try again. You should see the optimization you expect

answered Oct 6, 2011 at 21:27

Cratylus

54.3k73 gold badges222 silver badges348 bronze badges

Comments

dbeer · Accepted Answer · 2011-10-06 21:30:04Z

0

As a general big of info, two of the biggest things that compiler have trouble optimizing are loops and pointers, and your example deals with both. Compilers know that values change frequently in loops, and so they are very conservative when optimizing those. Also, A is a pointer, and compilers are aware that pointers can change due to diverse factors, and so they back off when changing those. That's why the compiler has trouble with your example.

answered Oct 6, 2011 at 21:30

dbeer

7,2733 gold badges34 silver badges49 bronze badges

2 Comments

Johan Over a year ago

Compilers have trouble with loops? Really? The pointers I believe, but I would love to have a citation for the loops.

dbeer Over a year ago

I didn't write that clearly. Compilers have troubles knowing whether or not a value will change within a loop.

R.. GitHub STOP HELPING ICE · Accepted Answer · 2011-10-06 21:36:41Z

0

This code has undefined behavior (using objects with indeterminate value) so why would you have any expectation for what it does?

answered Oct 6, 2011 at 21:36

R.. GitHub STOP HELPING ICE

217k36 gold badges404 silver badges744 bronze badges

3 Comments

Hot Licks Over a year ago

Actually, one of the most feared things in optimization is uninitialized values. Many compilers will punt if a value is uninitialized, if only because they depend so heavily on value propagation algorithms.

tangrs Over a year ago

Why is it undefined behavior? In this case he's allocated some memory, taking uninitialized values and saving them back. Nothing bad is happening here.

R.. GitHub STOP HELPING ICE Over a year ago

Any use of an object with indeterminate value is UB.

Collectives™ on Stack Overflow

Why this loop is not optimised out?

7 Answers 7

4 Comments

Comments

4 Comments

2 Comments

Comments

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

Comments

4 Comments

2 Comments

Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related