0

Assuming I have three integer vectors:

  • mainVect of size 8 million element
  • vect1 of size 1.5 million element
  • vect2 of size 1.5 million element

I want to run the following code:

int i,j;
for ( i = 0; i < vect1.size(); i++)
{
    for ( j = 0; j < mainVect.size(); j++)
    {
        if (vect1[i] == mainVect[j])
        {
            vect2[i]++;             
        }
    }
}

It took a very long time without finishing yet...How I can speed up the run, I have multiprocessors. As a try, I've added the following sentence before the previous code (I read that it run in parallel)

#pragma omp parallel for private(i, j) shared( mainVect, vect1, vect2)

But still slow ...

If I divide the for loop into 4 sections; for example, how I can make these for loops run simultaneously such as

for ( i = 0; i < vect1.size()/4; i++)
{

}

for ( i = vect1.size()/4; i < vect1.size()/2; i++)
{

}
.... and so on

Or any other methods ...

P.S.: Windows google cloud machine, n1-standard-4 (4 vCPUs, 15 GB memory) .. CPU utilization only 27% when run the above code.

10
  • 2
    how many cores you have is irrelevant if you're not writing code that uses multiple threads. I'm no expert on this syntax but doubt that what you've written tells the compiler how to parallelise your loop, instead only how it should share those variables if you were parallelising them. Commented Dec 8, 2017 at 14:13
  • 1
    CPU usage only 27% hints towards your parallelization not working... Commented Dec 8, 2017 at 14:13
  • Can you sort mainVect ? Commented Dec 8, 2017 at 14:15
  • For vector operation you could use gpu instead. Commented Dec 8, 2017 at 14:19
  • 2
    I don't think any amount of linear speedup is going to help - you have twelve thousand billion iterations. Four cores optimally used would bring the waiting time down to the equivalent of three thousand billion. (At one nanosecond per iteration - which I believe is optimistic - that's from over three hours to just under one.) On the other hand, first counting the elements of mainVect and then doing 1.5 million table lookups could possibly cut the time down to a matter of seconds. Commented Dec 8, 2017 at 14:52

2 Answers 2

4

8 million ints do not take much space. You may use unordered_map or some other efficient associative containers.

unordered_map<int, int> umap;
for (int v : mainVect) {
    umap[v]++;
}
for (int i = 0; i < vect1.size(); ++i) {
    if (umap.count(vect1[i])) {
        vect2[i] += umap[vect1[i]];
    }
}

This one takes linear time to populate vect2 which is very fast.

Sign up to request clarification or add additional context in comments.

1 Comment

I like this one. It basically uses the same idea as mine, but the use of an associative container is probably a lot faster than sorting. I'm not a C++ expert and don't know what tools are in the box, but I know how to sort :-) Fact is that bot approaches outperform multi-threading easily.
3

Using threads is one possible solution.

But the main question is: what problem are you trying to solve?

If I understand it correctly, you're counting the number of occurrences in mainVect of an element in vect1. Since you don't need to know where, you can rearrange (a copy of) mainVect.

My approach would be:

  1. Sort mainVect
  2. convert mainVect to a table consisting of "key" and number of occurrences
  3. Sort vect1 and create an indirection vector. Iterating over this indirection vector gives the "key"s in ascending sequence
  4. now you can "merge"

The complexity of this approach is O(n log n)

3 Comments

I can't sort mainVect
@noor then take a vector of indexes of mainVect and sort that vector instead
Why not? you don't need to sort it in place. And you can also use an indirection table like @UKmonkey suggests

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.