Parallelizing a for loop using openmp & replacing push_back

Question

I'd like to parallelize the following piece of code but am new to openmp and creating parallel code.

std::vector<DMatch> good_matches;
for (int i = 0; i < descriptors_A.rows; i++) {
   if (matches_RM[i].distance < 3 * min_dist) {
      good_matches.push_back(matches_RM[i]);
   }
}

I have tried

std::vector<DMatch> good_matches;
#pragma omp parallel for
for (int i = 0; i < descriptors_A.rows; i++) {
   if (matches_RM[i].distance < 3 * min_dist) {
      good_matches[i] = matches_RM[i];
   }
}

and

std::vector<DMatch> good_matches;
cv::DMatch temp;
#pragma omp parallel for
for (int i = 0; i < descriptors_A.rows; i++) {
   if (matches_RM[i].distance < 3 * min_dist) {
      temp = matches_RM[i];
      good_matches[i] = temp;
      // AND ALSO good_matches.push_back(temp);
   }

I have also tried

#omp parallel critical 
good_matches.push_back(matches_RM[i]);

This clause works but does not speed anything up. It may be the case that this for loop cannot be sped up but it'd be great if it can be. I'd also like to speed this up as well

std::vector<Point2f> obj, scene;
for (int i = 0; i < good_matches.size(); i++) {
   obj.push_back(keypoints_A[good_matches[i].queryIdx].pt);
   scene.push_back(keypoints_B[good_matches[i].trainIdx].pt);
}

Apologies if this question as been answered and thank you very much to anyone who can help.

Where are you allocating?

Yakk - Adam Nevraumont
– Yakk - Adam Nevraumont

2014-07-15 19:23:05 +00:00
Commented Jul 15, 2014 at 19:23 — Yakk - Adam Nevraumont
– Yakk - Adam Nevraumont, Commented Jul 15, 2014 at 19:23
stackoverflow.com/questions/18669296/…

Z boson
– Z boson

2014-07-16 07:11:01 +00:00
Commented Jul 16, 2014 at 7:11 — Z boson
– Z boson, Commented Jul 16, 2014 at 7:11

Z boson · Accepted Answer · 2014-07-16 07:24:07Z

8

I showed how to do this here c-openmp-parallel-for-loop-alternatives-to-stdvector

Make private versions of the std::vector and fill the shared std::vector in a critical section like this:

std::vector<DMatch> good_matches;
#pragma omp parallel
{
    std::vector<DMatch> good_matches_private;
    #pragma omp for nowait
    for (int i = 0; i < descriptors_A.rows; i++) {
       if (matches_RM[i].distance < 3 * min_dist) {
          good_matches_private.push_back(matches_RM[i]);
       }
    }
    #pragma omp critical
    good_matches.insert(good_matches.end(), good_matches_private.begin(), good_matches_private.end());
}

answered Jul 16, 2014 at 7:24

Z boson

34k14 gold badges132 silver badges238 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Massimiliano · Accepted Answer · 2014-07-15 19:30:42Z

2

One possibility may be to use private vectors for each thread and combine them in the end:

#include<omp.h>

#include<algorithm>
#include<iterator>
#include<iostream>
#include<vector>

using namespace std;

int main()
{
  vector<int> global_vector;  
  vector< vector<int> > buffers;

  #pragma omp parallel
  {
    auto nthreads = omp_get_num_threads();
    auto id = omp_get_thread_num();
    //
    // Correctly set the number of buffers
    //
  #pragma omp single
    {
      buffers.resize( nthreads );
    }
    //
    // Each thread works on its chunk
    // If order is important maintain schedule static
    //
  #pragma omp for schedule(static)
    for(size_t ii = 0; ii < 100; ++ii) {      
      if( ii % 2 != 0 ) { // Any other condition will do
          buffers[id].push_back(ii);
      }
    }
    //
    // Combine buffers together
    //
    #pragma omp single
    {
      for( auto & buffer : buffers) {
        move(buffer.begin(),buffer.end(),back_inserter(global_vector));
      }
    }
  }
  //
  // Print the result
  //
  for( auto & x : global_vector) {
    cout << x << endl;
  }    
  return 0;
}

The actual speed-up depends only on the amount of work done inside each loop.

answered Jul 15, 2014 at 19:30

Massimiliano

8,0482 gold badges51 silver badges64 bronze badges

5 Comments

Z boson Over a year ago

Although using private vectors is the way to go you make it unnecessarily complicated. There is no need to use run time functions and create the vectors in a single call. Not only that but it could have cache issues.

Massimiliano Over a year ago

@Zboson Usually I go for the safest solution, then for the fastest. My snippet constructs a global_vector which is exactly equal to the one constructed in the serial case (not a permutation)

Z boson Over a year ago

Oh, I see what you mean. Your solution preserves the order but mine does not necessarily. I'm not sure the order matters for the OP but it might.

Z boson Over a year ago

+1, although the fact that the OP says he got the right answer using critical indicates that the order is not important your answer is still interesting nevertheless.

Zulan Over a year ago

That this solution has a significant performance issue: buffers[id].push_back(ii) will cause false sharing as the private vectors (their 3 pointers, not the actual data) of multiple threads are on the same cache line. See also stackoverflow.com/a/43064331/620382 for a discussion.

Arch D. Robison · Accepted Answer · 2014-07-16 19:04:40Z

2

TBB's concurrent_vector acts much like std::vector, but allows parallel calls to push_back.

answered Jul 16, 2014 at 19:04

Arch D. Robison

4,0892 gold badges19 silver badges29 bronze badges

3 Comments

Massimiliano Over a year ago

Could it open compatibility problems using tbb containers inside OpenMP work-sharing constructs?

Z boson Over a year ago

@Massimiliano, maybe he's suggesting to use TBB instead of OpenMP?

Arch D. Robison Over a year ago

TBB concurrent containers have no dependencies on TBB tasking, and thus can be used with OpenMP or with PThreads. That was a deliberate design decision from the start of TBB.

Collectives™ on Stack Overflow

Parallelizing a for loop using openmp & replacing push_back

3 Answers 3

Comments

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related