1

I'm new to openMP and I try to optimize for loop. The result is not as expected, the for loops are not working correctly (due to dependency). I don't understand how to get a perfect parallel loop with the examples below :

    #pragma omp parallel for default(shared) reduction(+...)
    for(i = rest - 1; i >= 0; i--) {
        scounts[i] += N;
    }

    #pragma omp parallel for private(i)
    for(i = 1; i < p; i++) {
        disp[i] = disp[i-1] + scounts[i-1];
    }

I tried these 2 pragma directives without any succes. What is the best way to proceed in these cases ?

2
  • 1
    You should be much more specific than "something goes wrong". What goes wrong? What do you expect? What happens instead? Commented Jun 11, 2015 at 15:41
  • @HristoIliev, I think it's clear why "something goes wrong" due to the dependency in the second loop. Commented Jun 12, 2015 at 9:03

2 Answers 2

1

You have already picked a hard problem to do in parallel. In general when writing an array you don't want elements of the array to depend on previous elements which is exactly what you have in your second loop.

Most people give up when they see a dependency. But these are the interesting cases which require a bit of thinking. In your case you second loop is equivalent to

type sum = 0; //replace type with int, float, double...
for(i = 1; i < p; i++) {
    sum += scounts[i-1];
    disp[i] = disp[0] + sum;
}

This is a cumulative sum (aka prefix sum). OpenMP does not provide easy constructs to do the prefix sum. You have to do it in two passes. Here is how you do it (I assumed the type of disp and scounts is int but you can replace it with float or whatever):

int *suma;
#pragma omp parallel
{
    int ithread = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    #pragma omp single
    {
        suma = malloc(nthreads * sizeof *suma);
        suma[0] = 0;
    }
    int sum = 0;
    #pragma omp for schedule(static)
    for (int i=1; i<p; i++) {
        sum += scounts[i-1];
        disp[i] = disp[0] + sum;
    }
    suma[omp_get_thread_num()+1] = sum;
    #pragma omp barrier
    int offset = 0;
    for(int i=0; i<(ithread+1); i++) {
        offset += suma[i];
    }
    #pragma omp for schedule(static)
    for(int i=1; i<p; i++) {
        disp[i] += offset;
    }
}
free(suma);

But if you're just learning OpenMP I suggest you start with an easier case first.

Sign up to request clarification or add additional context in comments.

Comments

0

Please use #pragma directly:

#pragma omp parallel ...

instead of #pragma in comment:

// #pragma omp parallel ...

2 Comments

There is actually a loop-carried dependency in the second loop.
Thanks for you answer, I put un comments the pragma because it doen't work correctly in parallel.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.