Parallelizing C++ code using OpenMP, calculations actually slower in parallel

Question

I have the following code that I want to parallelize:

int ncip( int dim, double R)
{   
    int i;
    int r = (int)floor(R);
    if (dim == 1)
    {   
        return 1 + 2*r; 
    }
    int n = ncip(dim-1, R); // last coord 0

    #pragma omp parallel for
    for(i=1; i<=r; ++i)
    {   
        n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
    }

    return n;
}

The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?

second attempt

    int ncip( int dim, double R)
{   
int i;
int r = (int)floor( R);
if ( dim == 1)
{   return 1 + 2*r; 
}


#pragma omp parallel 
{
int n = ncip( dim-1, R); // last coord 0
#pragma omp for reduction (+:n)
for( i=1; i<=r; ++i)
{   
    n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i
}

}

return n;

}

do you have nested parallelism enabled ? have you tried using pragma omp task ? how many threads can you spawn ? what processor do you use ? what arguments do you pass in ? — Piotr Skotnicki
– Piotr Skotnicki, Commented May 24, 2016 at 17:39
and you seem to update n in each iteration, which is illegal this way, unless you are using reduction — Piotr Skotnicki
– Piotr Skotnicki, Commented May 24, 2016 at 17:41
use tasks and store intermediate results in an array. in the end, add up all results — Piotr Skotnicki
– Piotr Skotnicki, Commented May 24, 2016 at 19:30
Can you please provide a specific reproducible example for your observed runtimes (see minimal reproducible example) — Zulan
– Zulan, Commented May 25, 2016 at 8:56

Hélder Gonçalves · Accepted Answer · 2016-05-24 22:30:25Z

5

You are doing that wrong!

(1) There are data races in variable n. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.

(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task.

Do not parallelize with #pragma omp for and try with the #pragma omp task. Look this example:

int ncip(int dim, double R){
    ...
    #pragma omp task
    ncip(XX, XX);

    #pragma omp taskwait
    ...
}

int main(int argc, char *argv[]) {
    #pragma omp parallel
    {
        #pragma omp single 
        ncip(XX, XX);
    } 
    return(0); 
}

UPDATE:

//Detailed version (without omp for and data races)
int ncip(int dim, double R){
    int n, r = (int)floor(R);

    if (dim == 1) return 1 + 2*r; 

    n = ncip(dim-1, R); // last coord 0

    for(int i=1; i<=r; ++i){   
        #pragma omp task
        {
            int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

            #pragma omp atomic
            n += aux;
        }
    }
    #pragma omp taskwait
    return n;
}

PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.

edited May 24, 2016 at 22:30

answered May 24, 2016 at 18:11

Hélder Gonçalves

8419 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Piotr Skotnicki Over a year ago

you probably need also omp single after omp parallel so that only one thread spawns tasks

Hélder Gonçalves Over a year ago

Yes, you are right, I forgot it to write in this example.

Mutating Algorithm Over a year ago

@HélderGonçalves The reduce directive goes inside the for loop ?

Zulan Over a year ago

Here are some examples how to uses tasks recursively properly: wikibooks on OpenMP Tasks custom reduction with tasks (and other approaches) recursive knights tour with tasks, including threshold (from this question)

Hélder Gonçalves Over a year ago

I corrected the example, to do no wait for each task. It is working. But you'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task.

|

Collectives™ on Stack Overflow

Parallelizing C++ code using OpenMP, calculations actually slower in parallel

1 Answer 1

14 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related