7

I am getting a strange result using global variables. This question was inspired by another question. In the code below if I change

int ncols = 4096;

to

static int ncols = 4096; 

or

const int ncols = 4096;

the code runs much faster and the assembly is much simpler.

//c99 -O3 -Wall -fopenmp foo.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int nrows = 4096;
int ncols = 4096;
//static int ncols = 4096;
char* buff;

void func(char* pbuff, int * _nrows, int * _ncols) {
    for (int i=0; i<*_nrows; i++) {
        for (int j=0; j<*_ncols; j++) {
            *pbuff += 1;
            pbuff++;
        }
    }
}

int main(void) {
    buff = calloc(ncols*nrows, sizeof*buff);
    double dtime = -omp_get_wtime();
    for(int k=0; k<100; k++) func(buff, &nrows, &ncols);
    dtime += omp_get_wtime();
    printf("time %.16e\n", dtime/100);
    return 0;
}

I also get the same result if char* buff is a automatic variable (i.e. not global or static). I mean:

//c99 -O3 -Wall -fopenmp foo.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int nrows = 4096;
int ncols = 4096;

void func(char* pbuff, int * _nrows, int * _ncols) {
    for (int i=0; i<*_nrows; i++) {
        for (int j=0; j<*_ncols; j++) {
            *pbuff += 1;
            pbuff++;
        }
    }
}

int main(void) {
    char* buff = calloc(ncols*nrows, sizeof*buff);
    double dtime = -omp_get_wtime();
    for(int k=0; k<100; k++) func(buff, &nrows, &ncols);
    dtime += omp_get_wtime();
    printf("time %.16e\n", dtime/100);
    return 0;
}

If I change buff to be a short pointer then the performance is fast and does not depend on if ncols is static or constant of if buff is automatic. However, when I make buff an int* pointer I observe the same effect as char*.

I thought this may be due to pointer aliasing so I also tried

void func(int * restrict pbuff, int * restrict _nrows, int * restirct _ncols)

but it made no difference.

Here are my questions

  1. When buff is either a char* pointer or a int* global pointer why is the code faster when ncols has file scope or is constant?
  2. Why does buff being an automatic variable instead of global or static make the code faster?
  3. Why does it make no difference when buff is a short pointer?
  4. If this is due to pointer aliasing why does restrict have no noticeable effect?

Note that I'm using omp_get_wtime() simply because it's convenient for timing.

12
  • 1
    If the variable is a compile-time constant, then the compiler may make certain optimizations because it know the variable can't change. Commented Jun 9, 2015 at 9:25
  • Why pass the sizes by reference? You could simply pass them by value. Commented Jun 9, 2015 at 9:26
  • @JoachimPileborg, that's clear, but i get the same result using static which just changes it to have file-scope instead of global scope. Also, if ncols was an automatic variable which was not declared constant the code would still be fast even if it's not declared constant. Commented Jun 9, 2015 at 9:27
  • 1
    If the variable is file-local, the compiler can analyze the translation unit and see that the variable doesn't change, and can apply the same optimizations. If the variable is not file-local, then the compiler doesn't know if it might be modified by some other translation unit. Commented Jun 9, 2015 at 9:31
  • @JoachimPileborg, that makes sense. But why would another translation unit changing the variable matter? All that maters is the value it had when it enters the loop? I could understand that if another thread could change the global variable during the loop then it would be a problem. Commented Jun 9, 2015 at 9:35

2 Answers 2

2

Some elements allow, as it's been written, GCC to assume different behaviors in terms of optimization; likely, the most impacting optimization we see is loop vectorization. Therefore,

Why is the code faster?

The code is faster because the hot part of it, the loops in func, have been optimized with auto-vectorization. In the case of a qualified ncols with static/const, indeed, GCC emits:

note: loop vectorized
note: loop peeled for vectorization to enhance alignment

which is visible if you turn on -fopt-info-loop, -fopt-info-vec or combinations of those with a further -optimized since it has the same effect.


  1. Why does buff being an automatic variable instead of global or static make the code faster?

In this case, GCC is able to compute the number of iterations which is intuitively necessary to apply vectorization. This is again due to the storage of buf which is external if not specified otherwise. The whole vectorization is immediately skipped, unlike when buff is local where it carries on and succeeds.

  1. Why does it make no difference when buff is a short pointer?

Why should it? func accepts a char* which may alias anything.

  1. If this is due to pointer aliasing why does restrict have no noticeable effect?

I don't think because GCC can see that they don't alias when func is invoked: restrict isn't needed.

Sign up to request clarification or add additional context in comments.

5 Comments

You make a good point about vectorization. But your second point I don't think it's because GCC can compute the number of iterations when buff is local (since ncols may still be global and non-constant it can change). It's because it knows that buff is not changed externally.
@Zboson Indeed: it can't compute the number of iterations because it doesn't know if it's changed somewhere else, externally. Sorry if it was unclear in the answer.
@Zboson Can you show the specific assembly where it reloads pbuff?
I looked at the assembly again and now it makes more sense. For char* and int* it's vectorized or not-vectorized. For short* it's vectorized if the pointer is const or not but the code is nevertheless quite different but the performance is more or less the same. I guess the only missing piece is how to get GCC to tell it to vectorize for int* and char* even when it's not constant like it did for short*. Normally, I used restrict for that.
Keep in mind that this is not my code idea. I'm just trying to make sense out of it.
1

A const will most likely always yield faster or equally fast code as a read/write variable, since the compiler knows that the variable won't be changed, which in turn enables a whole lot of optimization options.

Declaring a file scope variable int or static int should not affect performance much, as it will still be allocated at the very same place: the .data section.

But as mentioned in comments, if the variable is global, the compiler might have to assume that some other file (translation unit) might modify it and therefore block some optimization. I suppose this is what's happening.

But this shouldn't be any concern anyhow, since there is never a reason to declare a global variable in C, period. Always declare them as static to prevent the variable from getting abused for spaghetti-coding purposes.

In general I'd also question your benchmarking results. In Windows you should be using QueryPerformanceCounter and similar. https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx

6 Comments

The static keyword here means that the variable has file scope. Maybe the compiler makes it const since it's not modified in this code, and no other file can modify it. What's your opinion on that?
The solution is clear so I'm not concerned about how to avoid this (it was another OP who found this, I would not have used this code). I want to know WHY. As to the benchmarking why don't you try it yourself (I provided code) but you'll need to use GCC to compare apples to apples.
@Aif File scope = a variable declared outside any function body. The static keyword has nothing to do with that. Anyway, since the compiler can deduct that the static variable isn't modified, it can indeed likely optimize the code better, which was essentially what I wrote in the answer.
@Zboson The resolution of omp_get_wtime() is whole seconds... seems like a very crude way to benchmark code.
@Aif, it's clear to me why static is like const in this case. Using file-scope the compiler can determine that the variables is constant within the translation unit and since no other translation unit sees the variable it is effectively constant.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.