0

I have few global variables which are being used by various functions in the C program. I am using OpenMP threads in parallel. Each thread will call these functions assigning different values to these global variables. Is there any other alternative to threadprivate? It's not clear to me how to use copyin clause. The sample code is like this:

int main (void) {
     int low=5,high=0;
     ----
     func1(int *k) { do something with low,high and set value for k}
     func2(int *k) { do something with low,high and set value for k}
     func3(int *k) { do something with low,high and set value for k}
     ----
     int i;
     int *arr= malloc(CONSTANT1*sizeof(int));
     #pragma omp parallel num_threads(numberOfThreads) threadprivate(low,high) private(i) shared(arr)
     {
     #pragma omp for
     for(i=0;i<CONSTANT1;i++) {
         low=low+CONSTANT2*i;
         high=low+CONSTANT2;
         func1(&arr[i]);
         func2(&arr[i]);
         func3(&arr[i]);
         ----
     }
     }
}

Or shall I use private(low,high) and pass them again and again to each function? Please advise.

8
  • Probably functions that you want to use in parallel code are performance sensitive? So you should never ever have them use global variables to be gin with. Pass them as parameters, this might even speed up your sequential code, because the compiler doesn't have to worry about aliasing. As a general rule, clean up your sequential code before you even attempt to parallelize it. Commented Oct 14, 2015 at 13:51
  • Thanks Jens. I am calling these functions probably million times. Is it still efficient to pass the parameters everytime than to let them use it as global variables? Commented Oct 14, 2015 at 13:53
  • 2
    Yes, parameter passing isn't expensive, at least some parameters more or less don't make the difference. But optimization of your code can be much better afterwards. Commented Oct 14, 2015 at 13:55
  • An alternative that you can do is make low and high arrays of size numberOfThreads. Outside of your parallel section, every element of the arrays should be set to what your current single values are. Inside the loop, you can make them shared and each thread will only access its single element of the low and high shared array. In general though, I'd suggest what @JensGustedt has suggested. Commented Oct 14, 2015 at 14:28
  • 1
    I really don't understand what you're trying to do. It looks sorta like you want a reduction of some sort on low and high to use outside of the parallel region or in another parallel region later which you don't show. But then you say that you only need them in your parallel region in the comments. Why don't you show the scalar code of what you are trying to achieve? Commented Oct 15, 2015 at 7:36

1 Answer 1

1

Your code snippet is quite obscure but seem buggy. Let's assume you had the following in mind when you asked your question:

int low=5, high=10;
#pragma omp threadprivate(low, high)

func1(int *k) { do something with low,high and set value for k}
func2(int *k) { do something with low,high and set value for k}
func3(int *k) { do something with low,high and set value for k}

[...]

int main (void) {
    [...]
    int i;
    int *arr= malloc(CONSTANT1*sizeof(int));
    #pragma omp parallel num_threads(numberOfThreads) private(i)
    {
        #pragma omp for
        for (i=0; i<CONSTANT1; i++) {
            low = low + CONSTANT2 * i;
            high = low + CONSTANT2;
            func1(&arr[i]);
            func2(&arr[i]);
            func3(&arr[i]);
            [...]
        }
    }
}

Then, although the use of threadprivate makes the code valid, you have a problem here because of low = low + CONSTANT2 * i;. This line depends on the previous value of low and is therefore not suited for parallelisation since the order matters. However, if you change your code like this:

        int lowinit = low;
        #pragma omp for
        for (i=0; i<CONSTANT1; i++) {
            low = lowinit + CONSTANT2 * i*(i+1)/2;

Then your code becomes correct (provided your functions do not change low internally).

In term of performance, I'm not sure the global versus parameter aspect of high and low will make much of a difference. However, it is clear to me that having them passed as parameters rather than global variable makes the code much cleaner and less error prone.

Finally, if the values of high and low have any sort of importance upon exit of the parallel loop or region, be aware that this is the ones of the master thread that will be kept, which are likely to be different from the ones they would have had without OpenMP. In such case, you can add these lines to your code where necessary to ensure correctness:

        low = lowinit + CONSTANT2 * (CONSTANT1-1)*CONSTANT1/2;
        high = low + CONSTANT2;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.