I'm trying to minimize some function with may parameters P doing a Particle Swarm Optimization. What you need to know to help me, is that this procedure requires the computation of a particular function (that I call foo) for different indices i (each index is linked to a set of parameters P). The time that foo spend on each i is unpredictable and can vary a lot for different i. As soon as one v[i] has been computed, I'd like to start the computation of another one. This procedure stops when one i optimizes the function (it means that the corresponding set of parameters P has been found).
So I want to parallelize the computation with OpenMP. I do the following thing :
unsigned int N(5);
unsigned int last_called(0);
std::vector<double> v(N,0.0);
std::vector<bool> busy(N,false);
std::vector<unsigned int> already_ran(N,0);
std::vector<unsigned int> to_run_in_priority(N);
for(unsigned int i(0);i<N;i++){
to_run_in_priority[i]=i;
}
do{
#pramga omp parallel for nowait
for(unsigned int i=0;i<N;i++){
if(!busy[to_run_in_priority[i]]){
busy[to_run_in_priority[i]]=true;
already_ran[to_run_in_priority[i]]++;
foo(v[to_run_in_priority[i]]);
busy[to_run_in_priority[i]]=false;
}
/*update to_run_in_priority*/
}
} while (/*condition*/)
If for instance I have 4 threads and N=5. The program will enter the for loop and lunch 4 threads. When the first i has been computed, it will lunch the 5th one. But then what will happen ?
Will the code continue, reach the while condition and enter again the for loop? If it does, as all the threads are busy, what will it do?
If what I want to do isn't clear, let me list I want :
- call
foofor eachion a separate thread (thread_numbers<N) - if some thread isn't running anymore, call again
foofor somei(the nextithat should run must be different than all other runningiand it should be aithat has run less times than the others). - do a loop on the two previous items until convergence criteria has been reached.
If i'm not clear enough, don't hesitate to ask precisions.
nowaitclause removes implicit barriers in the parallel for loops. However, the way you have used it is useless since there is a barrier which can't be removed at the end of a parallel block. Anyway, High Perfromance Mark's suggestion to useschedule(dynamic)seems to be what you want.