1

I have the following piece of C++ code. The scale of the problem is N and M. Running the code takes about two minutes on my machine. (after g++ -O3 compilation). Is there anyway to further accelerate it, on the same machine? Any kind of option, choosing a better data structure, library, GPU or parallelism, etc, is on the table.

void demo() {
    int N = 1000000;
    int M=3000;
    vector<vector<int> > res(M);
    for (int i =0; i <N;i++) {
       for (int j=1; j < M; j++){
           res[j].push_back(i);
       }
    }
}


int main() {
  demo();
  return 0;
}

An additional info: The second loop above for (int j=1; j < M; j++) is a simplified version of the real problem. In fact, j could be in a different range for each i (of the outer loop), but the number of iterations is about 3000.

14
  • 2
    res[j].reserve(3000); before the inner for loop is almost certainly going to help Commented Oct 29, 2020 at 7:19
  • 1
    possibly swapping the loops to make the memory access more sequential Commented Oct 29, 2020 at 7:20
  • 1
    Are the M vectors truly identical, or is that just an artifact of the example? Commented Oct 29, 2020 at 7:20
  • 1
    When handling 2D arrays it's often recommended to use a 1D array and perform the index calculation yourself. Better cache performance (or something, no expert here). Commented Oct 29, 2020 at 7:20
  • A single vector reserved in advance rather than a nested one will probably make a big difference Commented Oct 29, 2020 at 7:21

3 Answers 3

5

With the exact code as shown when writing this answer, you could create the inner vector once, with the specific size, and call iota to initialize it. Then just pass this vector along to the outer vector constructor to use it for each element.

Then you don't need any explicit loops at all, and instead use the (highly optimized, hopefully) standard library to do all the work for you.

Perhaps something like this:

void demo()
{
    static int const N = 1000000;
    static int const M = 3000;

    std::vector<int> data(N);
    std::iota(begin(data), end(data), 0);

    std::vector<std::vector<int>> res(M, data);
}
Sign up to request clarification or add additional context in comments.

Comments

3

Alternatively you could try to initialize just one vector with that elements, and then create the other vectors just by copying that part of the memory using std::memcpy or std::copy.

Another optimization would be to allocate the memory in advance (e.g. array.reserve(3000)).

Also if you're sure that all the members of the vector are similar vectors, you could do a hack by just creating a single vector with 3000 elements, and in the other res just put the same reference of that 3000-element vector million times.

1 Comment

Thank you!. Could you please elaborate your answer with code?
1

On my machine which has enough memory to avoid swapping your original code took 86 seconds.

Adding reserve:

    for (auto& v : res)
    {
        v.reserve(N);
    }

made basically no difference (85 seconds but I only ran each version once).

Swapping the loop order:

for (int j = 1; j < M; j++) {
        for (int i = 0; i < N; i++) {
            res[j].push_back(i);
        }
    }

reduced the time to 10 seconds, this is likely due to a combination of allowing the compiler to use SIMD optimisations and improving cache coherency by accessing memory in sequential order.

Creating one vector and copying it into the others:

    for (int i = 0; i < N; i++) {
        res[1].push_back(i);
    }
    for (int j = 2; j < M; j++) {
        res[j] = res[1];
    }

reduced the time to 4 seconds. Using a single vector:

void demo() {
    size_t N = 1000000;
    size_t M = 3000;
    vector<int> res(M*N);
    size_t offset = N;
    for (size_t i = 0; i < N; i++) {
        res[offset++] = i;
    }
    for (size_t j = 2; j < M; j++) {
        std::copy(res.begin() + N, res.begin() + N * 2, res.begin() + offset);
        offset += N;
    }
}

also took 4 seconds, there probably isn't much improvement because you have 3,000 4 MB vectors, there would likely be more difference if N was smaller or M was larger.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.