I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:
#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i++){
for(int j = 0 ; j < a.size() ; j++){
if(meetsConditions(a[i], a[j])){
std::vector<int> tmp = {i, j};
#pragma omp critical
b.push_back(tmp);
}
}
}
I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:
for(auto element : b){
doSmth(a[element[0]], a[element[1]]);
}
I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C++ so I'm not quite sure how I could get that to work.
b_localto each thread, and at the end concatenate all theb_localto the sharedb. But I'm not sure it's worth parallelizing this, unlessmeetsConditions()has many computationsb_localand then concatenate it all? It sounds intimidating as I can't think of how different versions of a variable from each thread can be accessed.std::array<int, 2>,std::pair<int, int>orstd::tuple<int, int>would be much better for cache locality.