3

I have the following code that I want to parallelize:

int ncip( int dim, double R)
{   
    int i;
    int r = (int)floor(R);
    if (dim == 1)
    {   
        return 1 + 2*r; 
    }
    int n = ncip(dim-1, R); // last coord 0

    #pragma omp parallel for
    for(i=1; i<=r; ++i)
    {   
        n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
    }

    return n;
}

The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?

second attempt

    int ncip( int dim, double R)
{   
int i;
int r = (int)floor( R);
if ( dim == 1)
{   return 1 + 2*r; 
}


#pragma omp parallel 
{
int n = ncip( dim-1, R); // last coord 0
#pragma omp for reduction (+:n)
for( i=1; i<=r; ++i)
{   
    n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i
}

}

return n;

}
4
  • do you have nested parallelism enabled ? have you tried using pragma omp task ? how many threads can you spawn ? what processor do you use ? what arguments do you pass in ? Commented May 24, 2016 at 17:39
  • and you seem to update n in each iteration, which is illegal this way, unless you are using reduction Commented May 24, 2016 at 17:41
  • use tasks and store intermediate results in an array. in the end, add up all results Commented May 24, 2016 at 19:30
  • Can you please provide a specific reproducible example for your observed runtimes (see minimal reproducible example) Commented May 25, 2016 at 8:56

1 Answer 1

5

You are doing that wrong!

(1) There are data races in variable n. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.

(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task.

Do not parallelize with #pragma omp for and try with the #pragma omp task. Look this example:

int ncip(int dim, double R){
    ...
    #pragma omp task
    ncip(XX, XX);

    #pragma omp taskwait
    ...
}

int main(int argc, char *argv[]) {
    #pragma omp parallel
    {
        #pragma omp single 
        ncip(XX, XX);
    } 
    return(0); 
}

UPDATE:

//Detailed version (without omp for and data races)
int ncip(int dim, double R){
    int n, r = (int)floor(R);

    if (dim == 1) return 1 + 2*r; 

    n = ncip(dim-1, R); // last coord 0

    for(int i=1; i<=r; ++i){   
        #pragma omp task
        {
            int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

            #pragma omp atomic
            n += aux;
        }
    }
    #pragma omp taskwait
    return n;
}

PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.

Sign up to request clarification or add additional context in comments.

14 Comments

you probably need also omp single after omp parallel so that only one thread spawns tasks
Yes, you are right, I forgot it to write in this example.
@HélderGonçalves The reduce directive goes inside the for loop ?
I corrected the example, to do no wait for each task. It is working. But you'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.