7

I've got a very simple c program which copies all elements from array A to back to array A. For example,

double *A;
A = (double*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
  A[i] = A[i];
}

I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?

Many thanks.

2
  • 3
    FWIW, can't repro on x86_64 with GCC 4.5, that loop is not generated in the assembly with -O3, only the malloc call is left. Commented Oct 6, 2011 at 21:17
  • 2
    What's the assembly you're getting? It's being optimized out for GCC 4.5.0 for me. Commented Oct 6, 2011 at 21:17

7 Answers 7

7

From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.

For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the reply. However, that happens for integer array too.
As a side note, on some architectures it is a no-op. For example, the PowerPC 601.
Why would a C compiler care, assuming FENV_ACCESS is off?
gcc ignores FENV_ACCESS pragmas so it needs to behave as if it's always on. Unfortunately it only behaves partly correctly, though...
7

my gcc (version 4.6.1) optimizes it out

$ cat 7680489.c
#include <stdlib.h>

#define SIZE 100

int main(void) {
  double *a;
  size_t i;

  a = calloc(SIZE, sizeof *a); /* initialize elements */
  for (i = 0; i < SIZE; i++) a[i] = a[i];
  free(a);

  return 0;
}
$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s
        .file   "7680489.c"
        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB3:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $8, %esi
        movl    $100, %edi
        call    calloc
        movq    %rax, %rdi
        call    free
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE3:
        .size   main, .-main
        .ident  "GCC: (Debian 4.6.1-4) 4.6.1"
        .section        .note.GNU-stack,"",@progbits

No loop that I can see. The assembly output is very similar when using malloc rather than calloc. I switched to calloc to avoid having objects with indeterminate values about (thanks R..).

Comments

2

To optimize away the loop the compiler had to recognize several things:

  • The load/store does not cause a modification of the data (due, eg, to floating point NaN conversions)
  • The two array addresses are identical
  • The two array indexing expressions are identical
  • After taking into account the addresses and indexing, the load & store are not overlapping but coincide exactly.
  • The store will not "step" on the results of other stores, or on a value that has not yet been loaded.
  • The load/store cannot result in a storage fault. This in turn requires that the compiler recognize that the storage came from malloc, and that the loop does not index beyond the end of the allocation.
  • The loop will terminate in a finite number of iterations
  • And probably several others I'm not thinking of

Keep in mind that optimizations are oriented towards removing "normal" redundancies, not eliminating "classroom examples".

4 Comments

The compiler does not have to recognize or predict storage faults. Indexing off the end of the array results in undefined behavior. That could mean a segfault, or it might not.
Depends on the compiler and what "contact" it adheres to.
What do you mean "contact"? The standard says the behavior is undefined, so either a segv or no segv are both valid results. Because both results are considered the same, there is no difference between optimizing and not, so the optimization is allowed.
I misspelled -- meant to say "contract".
1

The compiler does not do any actual thinking.
It can only optimize out stuff that matches a preconceived pattern.

I.e. if the code does not match a known no-op pattern that has been pre-programmed into the compiler it does not get eliminated.

By putting in the A[i] = A[i] you changed the pattern just enough to not match the empty loop pattern

2 Comments

According to the OP the same thing happens to an integer array, so how is this an incorrect answer?
Actually, compilers mostly optimize by doing a number of small optimizations, vs recognizing "patterns" at a high level. Eg, the compiler would common the array indexing for the two arrays, given that they have the same pointer and array index. Then it would recognize that there was a copy of a value to itself. (But, as I indicated in my post, there are still a number of hurdles before the copy can be eliminated.)
0

The problem here is that you are using pointers.

It is hard for a compiler to optimise pointers since it can assume that pointer can read/write to any location in memory.

Change to [] array operator instead and try again. You should see the optimization you expect

Comments

0

As a general big of info, two of the biggest things that compiler have trouble optimizing are loops and pointers, and your example deals with both. Compilers know that values change frequently in loops, and so they are very conservative when optimizing those. Also, A is a pointer, and compilers are aware that pointers can change due to diverse factors, and so they back off when changing those. That's why the compiler has trouble with your example.

2 Comments

Compilers have trouble with loops? Really? The pointers I believe, but I would love to have a citation for the loops.
I didn't write that clearly. Compilers have troubles knowing whether or not a value will change within a loop.
0

This code has undefined behavior (using objects with indeterminate value) so why would you have any expectation for what it does?

3 Comments

Actually, one of the most feared things in optimization is uninitialized values. Many compilers will punt if a value is uninitialized, if only because they depend so heavily on value propagation algorithms.
Why is it undefined behavior? In this case he's allocated some memory, taking uninitialized values and saving them back. Nothing bad is happening here.
Any use of an object with indeterminate value is UB.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.