How do I parallelise a for loop, using `omp parallel` or otherwise?

Question

Assuming I have three integer vectors:

mainVect of size 8 million element
vect1 of size 1.5 million element
vect2 of size 1.5 million element

I want to run the following code:

int i,j;
for ( i = 0; i < vect1.size(); i++)
{
    for ( j = 0; j < mainVect.size(); j++)
    {
        if (vect1[i] == mainVect[j])
        {
            vect2[i]++;             
        }
    }
}

It took a very long time without finishing yet...How I can speed up the run, I have multiprocessors. As a try, I've added the following sentence before the previous code (I read that it run in parallel)

#pragma omp parallel for private(i, j) shared( mainVect, vect1, vect2)

But still slow ...

If I divide the for loop into 4 sections; for example, how I can make these for loops run simultaneously such as

for ( i = 0; i < vect1.size()/4; i++)
{

}

for ( i = vect1.size()/4; i < vect1.size()/2; i++)
{

}
.... and so on

Or any other methods ...

P.S.: Windows google cloud machine, n1-standard-4 (4 vCPUs, 15 GB memory) .. CPU utilization only 27% when run the above code.

how many cores you have is irrelevant if you're not writing code that uses multiple threads. I'm no expert on this syntax but doubt that what you've written tells the compiler how to parallelise your loop, instead only how it should share those variables if you were parallelising them. — underscore_d
– underscore_d, Commented Dec 8, 2017 at 14:13
CPU usage only 27% hints towards your parallelization not working... — Fl.pf.
– Fl.pf., Commented Dec 8, 2017 at 14:13
I don't think any amount of linear speedup is going to help - you have twelve thousand billion iterations. Four cores optimally used would bring the waiting time down to the equivalent of three thousand billion. (At one nanosecond per iteration - which I believe is optimistic - that's from over three hours to just under one.) On the other hand, first counting the elements of mainVect and then doing 1.5 million table lookups could possibly cut the time down to a matter of seconds. — molbdnilo
– molbdnilo, Commented Dec 8, 2017 at 14:52

abdullah · Accepted Answer · 2017-12-08 14:31:30Z

4

8 million ints do not take much space. You may use unordered_map or some other efficient associative containers.

unordered_map<int, int> umap;
for (int v : mainVect) {
    umap[v]++;
}
for (int i = 0; i < vect1.size(); ++i) {
    if (umap.count(vect1[i])) {
        vect2[i] += umap[vect1[i]];
    }
}

This one takes linear time to populate vect2 which is very fast.

answered Dec 8, 2017 at 14:31

abdullah

6625 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ronald Over a year ago

I like this one. It basically uses the same idea as mine, but the use of an associative container is probably a lot faster than sorting. I'm not a C++ expert and don't know what tools are in the box, but I know how to sort :-) Fact is that bot approaches outperform multi-threading easily.

Ronald · Accepted Answer · 2017-12-08 14:21:06Z

3

Using threads is one possible solution.

But the main question is: what problem are you trying to solve?

If I understand it correctly, you're counting the number of occurrences in mainVect of an element in vect1. Since you don't need to know where, you can rearrange (a copy of) mainVect.

My approach would be:

Sort mainVect
convert mainVect to a table consisting of "key" and number of occurrences
Sort vect1 and create an indirection vector. Iterating over this indirection vector gives the "key"s in ascending sequence
now you can "merge"

The complexity of this approach is O(n log n)

answered Dec 8, 2017 at 14:21

Ronald

2,92719 silver badges18 bronze badges

3 Comments

userInThisWorld Over a year ago

I can't sort mainVect

UKMonkey Over a year ago

@noor then take a vector of indexes of mainVect and sort that vector instead

Ronald Over a year ago

Why not? you don't need to sort it in place. And you can also use an indirection table like @UKmonkey suggests

Collectives™ on Stack Overflow

How do I parallelise a for loop, using `omp parallel` or otherwise?

2 Answers 2

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related