0

I would like to parallelize a code section which executes 2 function calls using OpenMP. I tried using "sections" parameter like this:

int func(int *V1, int *V2, int length){
  int result=0;
  int i;

  for(i=0;i<length;i++){
    result = result + V1[i] + V2[i];
  }
  return result;
}

int main(){

  omp_set_num_threads(32);
  #pragma omp parallel sections
  {
    #pragma omp section
    {
      result1 = func(array_A,array_B,1000000);
    }
    #pragma omp section
    {
      result2 = func(array_X,array_Y,2000000);
    }
  }
}

But I get only like 33% of efficiency (only 1 thread executes each function). For example I would like to use 16 threads to execute each function, but I can't find the solution (I tried using #pragma omp parallel for in each function with no good results).

1 Answer 1

2

Don't use sections. Don't set the number of threads (use the default). Do this:

#include <stdlib.h>   
int func(int *V1, int *V2, int length) {
    int result=0;
    int i;
    #pragma omp parallel for reduction(+:result)
    for(i=0;i<length;i++) {
        result += V1[i] + V2[i];
    }
    return result;
}

int main(){
    int result1, result2;
    int *array_A, *array_B, *array_X, *array_Y;
    array_A = malloc(sizeof(int)*1000000);
    array_B = malloc(sizeof(int)*1000000);
    array_X = malloc(sizeof(int)*2000000);
    array_Y = malloc(sizeof(int)*2000000);

    result1 = func(array_A,array_B,1000000);
    result2 = func(array_X,array_Y,2000000);
    //now do something with result1 and result2
    return 0;
}

Since the OP insists on dividing the threads between function calls I have come up with a solution. It's not the right approach and it won't be any better than the above code but here it is anyway.

void foo(int *V1, int *V2, int length1, int *V3, int *V4, int length2) {
    int result1, result2;
    result1=0; result2=0;
    #pragma omp parallel
    {
        int i, ithread, nthreads, start, finish, result_private, *a1, *a2;
        ithread = omp_get_thread_num(); nthreads = omp_get_num_threads();
        if(ithread<nthreads/2) {
            start = ithread*length1/(nthreads/2);
            finish = (ithread+1)*length1/(nthreads/2);
            a1 = V1; a2 = V2;          
        }
        else {
            start  = (ithread - nthreads/2)*length2/(nthreads - nthreads/2);
            finish = (ithread+1 - nthreads/2)*length2/(nthreads - nthreads/2);
            a1 = V3; a2 = V4;
        }
        result_private = 0;
        #pragma omp for nowait
        for(i=start; i<finish; i++) {
            result_private += a1[i] + a2[i];
        }
        #pragma omp critical
        {
            if(ithread<nthreads/2) {
                result1 += result_private;
            }
            else {
                result2 += result_private;
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

10 Comments

Thank you, the results are better. But, it is no possible using 16 threads per function at the same time? I think the execution would be faster.
@user1702964, I don't see why running 16 threads on one function and 16 on the other would be any faster. Ideally(ignoring the cache, hyper-threading, ...), if you run your function twice with 32 threads or both functions simultaneously with 16 threads in parallel both methods should finish in the same time. If you're worried about performance, your function has a dependency chain, try unrolling the loop a few times.
I have run your program and it gets the peak of performance with 8 threads, so the best solution would be using 8 threads per function. I put the example with 2 calls of function, but in my case I call to the function 3 times.
@user1702964, that's likely the wrong conclusion. What kind of system do you have? What kind of Intel or AMD processors does it have? How many physical and logical processors does it have?
I'm running it on a AMD with 48 CPUs (24 physical * 2 logical), 1 thread per core. If I use your code I'm executing with 8 processors firstly func(array_A,array_B,1000000) (40 processors doing nothing at that time) and then result2 = func(array_X,array_Y,2000000) with 8 processors. I think it would be faster running the two functions at time. It is possible to do that with OpenMP?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.