Parallelizing a code with 2 function calls using OpenMP

Question

I would like to parallelize a code section which executes 2 function calls using OpenMP. I tried using "sections" parameter like this:

int func(int *V1, int *V2, int length){
  int result=0;
  int i;

  for(i=0;i<length;i++){
    result = result + V1[i] + V2[i];
  }
  return result;
}

int main(){

  omp_set_num_threads(32);
  #pragma omp parallel sections
  {
    #pragma omp section
    {
      result1 = func(array_A,array_B,1000000);
    }
    #pragma omp section
    {
      result2 = func(array_X,array_Y,2000000);
    }
  }
}

But I get only like 33% of efficiency (only 1 thread executes each function). For example I would like to use 16 threads to execute each function, but I can't find the solution (I tried using #pragma omp parallel for in each function with no good results).

Z boson · Accepted Answer · 2014-02-18 19:42:53Z

2

Don't use sections. Don't set the number of threads (use the default). Do this:

#include <stdlib.h>   
int func(int *V1, int *V2, int length) {
    int result=0;
    int i;
    #pragma omp parallel for reduction(+:result)
    for(i=0;i<length;i++) {
        result += V1[i] + V2[i];
    }
    return result;
}

int main(){
    int result1, result2;
    int *array_A, *array_B, *array_X, *array_Y;
    array_A = malloc(sizeof(int)*1000000);
    array_B = malloc(sizeof(int)*1000000);
    array_X = malloc(sizeof(int)*2000000);
    array_Y = malloc(sizeof(int)*2000000);

    result1 = func(array_A,array_B,1000000);
    result2 = func(array_X,array_Y,2000000);
    //now do something with result1 and result2
    return 0;
}

Since the OP insists on dividing the threads between function calls I have come up with a solution. It's not the right approach and it won't be any better than the above code but here it is anyway.

void foo(int *V1, int *V2, int length1, int *V3, int *V4, int length2) {
    int result1, result2;
    result1=0; result2=0;
    #pragma omp parallel
    {
        int i, ithread, nthreads, start, finish, result_private, *a1, *a2;
        ithread = omp_get_thread_num(); nthreads = omp_get_num_threads();
        if(ithread<nthreads/2) {
            start = ithread*length1/(nthreads/2);
            finish = (ithread+1)*length1/(nthreads/2);
            a1 = V1; a2 = V2;          
        }
        else {
            start  = (ithread - nthreads/2)*length2/(nthreads - nthreads/2);
            finish = (ithread+1 - nthreads/2)*length2/(nthreads - nthreads/2);
            a1 = V3; a2 = V4;
        }
        result_private = 0;
        #pragma omp for nowait
        for(i=start; i<finish; i++) {
            result_private += a1[i] + a2[i];
        }
        #pragma omp critical
        {
            if(ithread<nthreads/2) {
                result1 += result_private;
            }
            else {
                result2 += result_private;
            }
        }
    }
}

edited Feb 18, 2014 at 19:42

answered Feb 18, 2014 at 11:47

Z boson

34k14 gold badges132 silver badges238 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

user1702964 Over a year ago

Thank you, the results are better. But, it is no possible using 16 threads per function at the same time? I think the execution would be faster.

Z boson Over a year ago

@user1702964, I don't see why running 16 threads on one function and 16 on the other would be any faster. Ideally(ignoring the cache, hyper-threading, ...), if you run your function twice with 32 threads or both functions simultaneously with 16 threads in parallel both methods should finish in the same time. If you're worried about performance, your function has a dependency chain, try unrolling the loop a few times.

user1702964 Over a year ago

I have run your program and it gets the peak of performance with 8 threads, so the best solution would be using 8 threads per function. I put the example with 2 calls of function, but in my case I call to the function 3 times.

Z boson Over a year ago

@user1702964, that's likely the wrong conclusion. What kind of system do you have? What kind of Intel or AMD processors does it have? How many physical and logical processors does it have?

user1702964 Over a year ago

I'm running it on a AMD with 48 CPUs (24 physical * 2 logical), 1 thread per core. If I use your code I'm executing with 8 processors firstly func(array_A,array_B,1000000) (40 processors doing nothing at that time) and then result2 = func(array_X,array_Y,2000000) with 8 processors. I think it would be faster running the two functions at time. It is possible to do that with OpenMP?

|

Collectives™ on Stack Overflow

Parallelizing a code with 2 function calls using OpenMP

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related