I am using OpenMP to parallelize loops. In normal case, one would use:
#pragma omp for schedule(static, N_CHUNK)
for(int i = 0; i < N; i++) {
// ...
}
For nested loops, I can put pragma on the inner or outter loop
#pragma omp for schedule(static, N_CHUNK) // can be here...
for(int i = 0; i < N; i++) {
#pragma omp for schedule(static, N_CHUNK) // or here...
for(int k = 0; k < N; k++) {
// both loops have consant number of iterations
// ...
}
}
But! I have two loops, where number of iterations in 2nd loop depends on the 1st loop:
for(int i = 0; i < N; i++) {
for(int k = i; k < N; k++) {
// k starts from i, not from 0...
}
}
What is the best way to balance CPU usage for this kind of loop?