0

I have this sequential code:

for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
  for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
  // Inner body
  }
}

My goal is to simply parallelize the inner loop. It could be done like this:

for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
  #pragma omp parallel for
  for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
  // Inner body
  }
}

The problem of this code is that on every run of the outer loop new threads are spawned. In order to speed up this code, I want to create a team of threads in advance and used them multiple times. I found that for this purpose there is a directive #pragma omp for.

#pragma omp parallel
for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
  #pragma omp for
  for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
  // Inner body
  }
}

However, if I understand it correctly the usage of the directive #pragma omp parallel leads to the fact that outer loop is run multiple time. Is this correct?

Edit: Here a more detailed example:

// Let say that the image is represented as an array of pixels
// where pixels is just one integer.
std::vector<Image> images = getImages();

for (auto & image : images) { // Loop over all images
  #pragma omp parallel for
  for (unsigned j = 0; j < image.size(); ++j) { // Loop over each pixel
    image.at(j) += addMagicConstant(j);      
  }
}

Goal: I want to spawn a team of threads and then used them repeatedly to parallelize only the inner loop (= loop over the image pixels).

7
  • 1
    Possible duplicate of omp parallel vs. omp parallel for Commented Oct 27, 2018 at 8:03
  • see bisqwit.iki.fi/story/howto/openmp "The parallel construct starts a parallel block. It creates a team of N threads" thus threads remains there during the whole parallel section; they are "used" in the for loop Commented Oct 27, 2018 at 8:06
  • 2
    Possible duplicate of How does OpenMP handle nested loops? and Nested loops, inner loop parallelization, reusing threads. If I am parsing things correctly, only the outer loop needs OpenMP attribute of #pragma omp parallel for collapse(2). Commented Oct 27, 2018 at 8:11
  • This is really over thinking and under estimating the OpenMP powerful features. by default OMP will make the thread pool no need to worry about that. Commented Oct 27, 2018 at 8:53
  • We cannot answer this without knowing anything about * "inner body"* and "outer body". Please prepare a minimal reproducible example or at the very least clearly describe what is done in those regions Commented Oct 27, 2018 at 9:15

2 Answers 2

1

Your code is perfectly valid and will indeed work:

#pragma omp parallel
for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
  #pragma omp for
  for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
  // Inner body
  }
}

#pragma omp parallel will spawn the threads. Each thread will then proceed through the outer loop. At each loop iteration, the threads will hit the #pragma omp for and the inner loop will be distributed among the threads. There is an implicit barrier at the end of each omp for block, so threads will wait until the inner loop has been completed before moving to the next outer loop iteration.

Having an omp for distributed loop within another for or while loop, or within a conditional section is possible if it is guaranteed that all threads will go into the loop.

It is forbidden however to use constructs such as :

#pragma omp parallel
for (unsigned ii= 0; ii< omp_thread_num(); ++ii) { // Number of iteration of outer loop depends on the thread
// Outer body
  #pragma omp for
  for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
  // Inner body
  }
}

or

#pragma omp parallel
if(condition_depending_on_thread_num) { 
  #pragma omp for
  for (unsigned j = 0; j < maxSize; ++j) { // Loop
  // Inner body
  }
}
Sign up to request clarification or add additional context in comments.

Comments

-1

Did you try:

#pragma omp parallel for
for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
  for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
  }
}

2 Comments

This will parallelize also the outer loop and that wasn't what I really want to.
That is exactly what you want. You want each thread to work on a single item. That way you will avoid cache miss orgy and get the maximum performance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.