Filtering vector onto another vector in Parallel

Question

I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:

#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i++){
   for(int j = 0 ; j < a.size() ; j++){
      if(meetsConditions(a[i], a[j])){
         std::vector<int> tmp = {i, j};
         #pragma omp critical
         b.push_back(tmp);
      }
   }
}

I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:

for(auto element : b){
   doSmth(a[element[0]], a[element[1]]);
}

I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C++ so I'm not quite sure how I could get that to work.

I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b. But I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations — PierU
– PierU, Commented Dec 14, 2022 at 11:44
What would be the syntax to create the b_local and then concatenate it all? It sounds intimidating as I can't think of how different versions of a variable from each thread can be accessed. — pron1ghtmare360
– pron1ghtmare360, Commented Dec 14, 2022 at 11:47
Nesting vectors is a bad idea in most circumstances. As the inner vector seems to always have the size 2, using std::array<int, 2>, std::pair<int, int> or std::tuple<int, int> would be much better for cache locality. — paleonix
– paleonix, Commented Dec 14, 2022 at 13:13

PierU · Accepted Answer · 2022-12-14 13:59:34Z

2

Assuming a.size() is large enough, I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b.

#pragma omp parallel
{
    std::vector<std::vector<int>> b_local;
    #pragma omp for
    for (int i = 0 ; i < a.size() ; i++){
        for(int j = 0 ; j < a.size() ; j++){
            if(meetsConditions(a[i], a[j])){
                std::vector<int> tmp = {i, j};
                b_local.push_back(tmp);
            }
        }
    }
    #pragma omp critical
    b.insert(b.end(),b_local.begin(),b_local.end()); 
}

This should be more efficient, as the critical section is now outside of the loops and encountered only once per thread. b_local being private to each threads, no critical is needed when updating it.

But actually I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations.

edited Dec 14, 2022 at 13:59

answered Dec 14, 2022 at 12:04

PierU

2,7371 gold badge5 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Laci Over a year ago

Alternatively, you can consider using a user defined reduction. E.g. see this answer.

Victor Eijkhout Over a year ago

@Laci You can not tell OMP that the reduction is non-commutative, so the local arrays may wind up in any order. Gross omission in the standard if you ask me. MPI does have a flag for this.

Laci Over a year ago

@VictorEijkhout Thank you for the comment. You have now pointed out an advantage of manual reduction that did not occur to me. I would like to note in parentheses that the code in PierU's answer also does not keep the original order.

Laci Over a year ago

@PierU Or you can use the ordered clause in a second loop, here is an example

PierU Over a year ago

@Laci nice... but isn't the test always true?

|

user20716902 · Accepted Answer · 2022-12-14 12:00:56Z

0

I would expect this code to run slower, as you have a a critcal section that is going to force the b.push_back(tmp) to run in a single thread.

Perhaps look to remove the critcal section and instead write the results directly to an already appropriatly sized b, something like:

vector<std::vector<int>> b;
b.resize(a.size() * a.size());

#pragma omp parallel for collapse(2)
for (int i = 0; i < a.size(); i++) {
    for (int j = 0; j < a.size(); j++) {
        if (meetsConditions(a[i], a[j])) {
            std::vector<int> tmp = { i, j };
            b.at((i*a.size())+j) = tmp;
        }
    }
}

edited Dec 14, 2022 at 12:00

answered Dec 14, 2022 at 11:35

user20716902

1,4344 silver badges17 bronze badges

9 Comments

PierU Over a year ago

This should be a comment, not an answer

pron1ghtmare360 Over a year ago

Wouldn't there be a data race?

user20716902 Over a year ago

Thanks @PierU for the comment, I don't have enough rep yet to add comments to the original post, otherwise I would have done so.

PierU Over a year ago

I understand that it can be frustating to not be able to comment (I was frustrated as well), but it's not a good enough reason to post answers that should be comments.

user20716902 Over a year ago

@paleonix I agree with you both, I missed the filtering point completely and focused on resolving the Parallel issue. I'm still new to SO, but I'll get better.

|

Collectives™ on Stack Overflow

Filtering vector onto another vector in Parallel

2 Answers 2

10 Comments

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related