0

I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:

#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i++){
   for(int j = 0 ; j < a.size() ; j++){
      if(meetsConditions(a[i], a[j])){
         std::vector<int> tmp = {i, j};
         #pragma omp critical
         b.push_back(tmp);
      }
   }
}

I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:

for(auto element : b){
   doSmth(a[element[0]], a[element[1]]);
}

I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C++ so I'm not quite sure how I could get that to work.

3
  • I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b. But I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations Commented Dec 14, 2022 at 11:44
  • What would be the syntax to create the b_local and then concatenate it all? It sounds intimidating as I can't think of how different versions of a variable from each thread can be accessed. Commented Dec 14, 2022 at 11:47
  • Nesting vectors is a bad idea in most circumstances. As the inner vector seems to always have the size 2, using std::array<int, 2>, std::pair<int, int> or std::tuple<int, int> would be much better for cache locality. Commented Dec 14, 2022 at 13:13

2 Answers 2

2

Assuming a.size() is large enough, I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b.

#pragma omp parallel
{
    std::vector<std::vector<int>> b_local;
    #pragma omp for
    for (int i = 0 ; i < a.size() ; i++){
        for(int j = 0 ; j < a.size() ; j++){
            if(meetsConditions(a[i], a[j])){
                std::vector<int> tmp = {i, j};
                b_local.push_back(tmp);
            }
        }
    }
    #pragma omp critical
    b.insert(b.end(),b_local.begin(),b_local.end()); 
}

This should be more efficient, as the critical section is now outside of the loops and encountered only once per thread. b_local being private to each threads, no critical is needed when updating it.

But actually I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations.

Sign up to request clarification or add additional context in comments.

10 Comments

Alternatively, you can consider using a user defined reduction. E.g. see this answer.
@Laci You can not tell OMP that the reduction is non-commutative, so the local arrays may wind up in any order. Gross omission in the standard if you ask me. MPI does have a flag for this.
@VictorEijkhout Thank you for the comment. You have now pointed out an advantage of manual reduction that did not occur to me. I would like to note in parentheses that the code in PierU's answer also does not keep the original order.
@PierU Or you can use the ordered clause in a second loop, here is an example
@Laci nice... but isn't the test always true?
|
0

I would expect this code to run slower, as you have a a critcal section that is going to force the b.push_back(tmp) to run in a single thread.

Perhaps look to remove the critcal section and instead write the results directly to an already appropriatly sized b, something like:

vector<std::vector<int>> b;
b.resize(a.size() * a.size());

#pragma omp parallel for collapse(2)
for (int i = 0; i < a.size(); i++) {
    for (int j = 0; j < a.size(); j++) {
        if (meetsConditions(a[i], a[j])) {
            std::vector<int> tmp = { i, j };
            b.at((i*a.size())+j) = tmp;
        }
    }
}

9 Comments

This should be a comment, not an answer
Wouldn't there be a data race?
Thanks @PierU for the comment, I don't have enough rep yet to add comments to the original post, otherwise I would have done so.
I understand that it can be frustating to not be able to comment (I was frustrated as well), but it's not a good enough reason to post answers that should be comments.
@paleonix I agree with you both, I missed the filtering point completely and focused on resolving the Parallel issue. I'm still new to SO, but I'll get better.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.