Parallel for loop with reduction and manipulating arrays

Question

I'm new to openMP and I try to optimize for loop. The result is not as expected, the for loops are not working correctly (due to dependency). I don't understand how to get a perfect parallel loop with the examples below :

    #pragma omp parallel for default(shared) reduction(+...)
    for(i = rest - 1; i >= 0; i--) {
        scounts[i] += N;
    }

    #pragma omp parallel for private(i)
    for(i = 1; i < p; i++) {
        disp[i] = disp[i-1] + scounts[i-1];
    }

I tried these 2 pragma directives without any succes. What is the best way to proceed in these cases ?

You should be much more specific than "something goes wrong". What goes wrong? What do you expect? What happens instead? — Hristo Iliev
– Hristo Iliev, Commented Jun 11, 2015 at 15:41
@HristoIliev, I think it's clear why "something goes wrong" due to the dependency in the second loop. — Z boson
– Z boson, Commented Jun 12, 2015 at 9:03

Z boson · Accepted Answer · 2015-06-12 08:56:47Z

You have already picked a hard problem to do in parallel. In general when writing an array you don't want elements of the array to depend on previous elements which is exactly what you have in your second loop.

Most people give up when they see a dependency. But these are the interesting cases which require a bit of thinking. In your case you second loop is equivalent to

type sum = 0; //replace type with int, float, double...
for(i = 1; i < p; i++) {
    sum += scounts[i-1];
    disp[i] = disp[0] + sum;
}

This is a cumulative sum (aka prefix sum). OpenMP does not provide easy constructs to do the prefix sum. You have to do it in two passes. Here is how you do it (I assumed the type of disp and scounts is int but you can replace it with float or whatever):

int *suma;
#pragma omp parallel
{
    int ithread = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    #pragma omp single
    {
        suma = malloc(nthreads * sizeof *suma);
        suma[0] = 0;
    }
    int sum = 0;
    #pragma omp for schedule(static)
    for (int i=1; i<p; i++) {
        sum += scounts[i-1];
        disp[i] = disp[0] + sum;
    }
    suma[omp_get_thread_num()+1] = sum;
    #pragma omp barrier
    int offset = 0;
    for(int i=0; i<(ithread+1); i++) {
        offset += suma[i];
    }
    #pragma omp for schedule(static)
    for(int i=1; i<p; i++) {
        disp[i] += offset;
    }
}
free(suma);

But if you're just learning OpenMP I suggest you start with an easier case first.

dlask · Accepted Answer · 2015-06-11 15:47:33Z

0

Please use #pragma directly:

#pragma omp parallel ...

instead of #pragma in comment:

// #pragma omp parallel ...

edited Jun 11, 2015 at 15:47

answered Jun 11, 2015 at 15:40

dlask

9,0421 gold badge29 silver badges31 bronze badges

2 Comments

Hristo Iliev Over a year ago

There is actually a loop-carried dependency in the second loop.

r0m Over a year ago

Thanks for you answer, I put un comments the pragma because it doen't work correctly in parallel.

Collectives™ on Stack Overflow

Parallel for loop with reduction and manipulating arrays

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related