C and OpenMP : nowait for loop in a do loop

Question

I'm trying to minimize some function with may parameters P doing a Particle Swarm Optimization. What you need to know to help me, is that this procedure requires the computation of a particular function (that I call foo) for different indices i (each index is linked to a set of parameters P). The time that foo spend on each i is unpredictable and can vary a lot for different i. As soon as one v[i] has been computed, I'd like to start the computation of another one. This procedure stops when one i optimizes the function (it means that the corresponding set of parameters P has been found).

So I want to parallelize the computation with OpenMP. I do the following thing :

unsigned int N(5); 
unsigned int last_called(0);
std::vector<double> v(N,0.0);
std::vector<bool> busy(N,false);
std::vector<unsigned int> already_ran(N,0);
std::vector<unsigned int> to_run_in_priority(N);
for(unsigned int i(0);i<N;i++){
    to_run_in_priority[i]=i;
}
do{
#pramga omp parallel for nowait 
    for(unsigned int i=0;i<N;i++){
        if(!busy[to_run_in_priority[i]]){
            busy[to_run_in_priority[i]]=true;
            already_ran[to_run_in_priority[i]]++;
            foo(v[to_run_in_priority[i]]);
            busy[to_run_in_priority[i]]=false;
        }
       /*update to_run_in_priority*/
    }
} while (/*condition*/)

If for instance I have 4 threads and N=5. The program will enter the for loop and lunch 4 threads. When the first i has been computed, it will lunch the 5th one. But then what will happen ?

Will the code continue, reach the while condition and enter again the for loop? If it does, as all the threads are busy, what will it do?

If what I want to do isn't clear, let me list I want :

call foo for each i on a separate thread (thread_numbers<N)
if some thread isn't running anymore, call again foo for some i (the next i that should run must be different than all other running i and it should be a i that has run less times than the others).
do a loop on the two previous items until convergence criteria has been reached.

If i'm not clear enough, don't hesitate to ask precisions.

The nowait clause removes implicit barriers in the parallel for loops. However, the way you have used it is useless since there is a barrier which can't be removed at the end of a parallel block. Anyway, High Perfromance Mark's suggestion to use schedule(dynamic) seems to be what you want. — Z boson
– Z boson, Commented Jan 2, 2014 at 10:35

High Performance Mark · Accepted Answer · 2013-12-30 16:09:24Z

4

Abstracting somewhat from your code you seem to want to write something like

#pramga omp parallel for
    for(unsigned int i=0;i<N;i++){
        v[i] = foo(i)
    }

but you are concerned that, because the computational effort of calls to foo(i) varies greatly, this simple approach will be poorly balanced if each thread simply gets a range of values of i on which to operate.

You are probably right to be concerned but I think, if my diagnosis is correct, that you are going the wrong way about balancing your program. The wrong way that you are going about is to try to program the allocation of work to threads yourself.

Try this (pseudo-code) instead:

#pramga omp parallel for schedule(dynamic,10)
    for(unsigned int i=0;i<N;i++){
        v[i] = foo(i)
    }

Notice the introduction of the schedule clause, in this case with parameters dynamic and 10. This directs the run-time to hand out bunches of values of i, 10 elements at a time, to individual threads. Depending on the distribution of run-time for each value of i, and of the magnitude of N this may be sufficient to balance the load.

Then again, it may not, and you may want to investigate the schedule clause a bit further, in particular both dynamic and guided scheduling.

If none of this appeals investigate the OpenMP task construct; I don't have time (nor, to be honest, the skill) to offer pseudo-code for that right now.

Finally, if I have misunderstood your question, which often happens, then this answer is probably worthless to you.

answered Dec 30, 2013 at 16:09

High Performance Mark

78.7k7 gold badges109 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

PinkFloyd Over a year ago

Tank you for your answer ! I already know the schedule command but this is only useful if the number of time foo need to be lunched is big (namely N>>number_of_threads). If it's the case then on average all the threads should stop at the same time. But in my case N~number_of_threads and therefore it's not very useful.

PinkFloyd Over a year ago

But you gave me an idea... maybe if I change what foo does, and if the for loop stops when the convergence criteria has been reached, I could find a solution.

Z boson Over a year ago

Half the problem here is the chunk size being set to 10. You should not set the chunk size and instead use the default of 1.

PinkFloyd Over a year ago

I posted my solution, if you can have a look and tell me what you think... tkx !!!

Z boson · Accepted Answer · 2014-01-03 09:06:32Z

0

You could try something like this:

#pragma omp parallel
{
    #pramga omp for schedule(dynamic) nowait 
    for(unsigned int i=0;i<N;i++){
       //parallel code with foo(i)
    }
    #pragma omp single 
    {
       //serial code
    }
}

Let's say N is 5 and there are 4 threads. The four threads start running and the first thread to finish starts i=4 and the first thread to finish after that enters the single statement.

edited Jan 3, 2014 at 9:06

answered Jan 2, 2014 at 18:34

Z boson

34k14 gold badges132 silver badges238 bronze badges

Comments

PinkFloyd · Accepted Answer · 2014-03-11 09:12:00Z

0

Thanks to your comments and answers, this is the solution i came up with.

unsigned int i(0);
unsigned int ip(0);
unsigned int N(10);
std::vector<bool> free(N,true)
#pragma omp parallel for schedule(dynamic,1) firstprivate(ip)
    for(unsigned int iter=0; iter<maxiter_; iter++){
#pragma omp critical
    {
        i++;
        ip = (i-1) % particle_.size();
        if(!free_[ip]){iter -= 1;}
    }
    if(free_[ip]){
        free_[ip]=false;
        if(ip<2){sleep(2);}
        else{ sleep(5);}
        free_[ip]=true;
    }
}

with the few tests I did, it seems to work. but does anyone have arguments against what I did?

edited Mar 11, 2014 at 9:12

answered Jan 3, 2014 at 11:14

PinkFloyd

2,2254 gold badges29 silver badges48 bronze badges

Collectives™ on Stack Overflow

C and OpenMP : nowait for loop in a do loop

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related