4

This is related to LeetCode 552. Student Attendance Record II. The question is fairly simple.

The first code snippet relates to use array to initialize DP and DP_new. The code runs like 50 ms.

The second code snippet relates to use vector of vector to initialize DP and DP_new. The code runs more than 2000 ms.

The only difference between those two code snippets is that I use array or std::vector to initilize data structure "DP" and "DP_new".

Why is there a huge performance gap?

class Solution {
public:
    int checkRecord(int n) {
        int mod = pow(10, 9) + 7;
        int DP[2][3];                       //using array here
        memset(DP, 0, sizeof(DP));
        DP[1][0] = 1;
        DP[0][0] = 1;
        DP[0][1] = 1;
        for (int i = 1; i < n; ++i) {
            int DP_new[2][3];
            memset(DP_new, 0, sizeof(DP_new));            //using array here
            DP_new[1][2] = DP[1][1];
            DP_new[1][1] = DP[1][0];
            DP_new[1][0] = ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
            DP_new[0][2] = DP[0][1];
            DP_new[0][1] = DP[0][0];
            DP_new[0][0] = ((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod;       
            swap(DP, DP_new);
        }
        return ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
    }
};
class Solution {
public:
    int checkRecord(int n) {
        int mod = pow(10, 9) + 7;
        vector<vector<int>> DP(2, vector<int>(3, 0));       //use 2D vector here
        DP[1][0] = 1;
        DP[0][0] = 1;
        DP[0][1] = 1;
        for (int i = 1; i < n; ++i) {
            vector<vector<int>> DP_new(2, vector<int>(3, 0));     //use 2D vector here
            DP_new[1][2] = DP[1][1];
            DP_new[1][1] = DP[1][0];
            DP_new[1][0] = ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
            DP_new[0][2] = DP[0][1];
            DP_new[0][1] = DP[0][0];
            DP_new[0][0] = ((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod;       
            swap(DP, DP_new);
        }
        return ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
    }
};

This is the LeetCode C++ compiler. enter image description here

3
  • 3
    Your first example has two arrays, on the stack. Allocation cost is zero. Your second example makes 3(n+1) heap allocations and frees, and runs 3(n+1) constructors. Consider using a std::aarray here for a fairer comparison. Commented Apr 28, 2022 at 20:08
  • If your matrix size is larger than the processor's data cache size, you may notice slow downs due to the processor having to reload the data cache. Commented Apr 28, 2022 at 21:59
  • Not only allocation cost, but also the cost for [] operator many times Commented Jun 14, 2022 at 4:50

1 Answer 1

2

Allocating memory on the heap is extremely slow, because it has to do a linear search for a chunk large enough to hold your data. In fact, the degenerate case of running out of memory means that you first checked every single chunk of physical and swap memory and came up empty.

Contrast that to a plain old array like in your first example, and the difference is easily explainable. It's all allocation speed.

Edit: I just saw your edit that you're using asan, that further exacerbates the problem by allocating very large chunks of memory around your every allocation.

Sign up to request clarification or add additional context in comments.

9 Comments

"because it has to do a linear search for a chunk large enough to hold your data" - this isn't universally true: red-black-tree-based heaps have logarithmic time complexity for free-space lookup.
Also, multidimensional arrays are entirely contiguous memory making it very cache friendly whereas a vector of vectors is disjointed in memory leading to increased cache misses.
I don't think it's a caching issue, he has 6 elements in his array. Unless you're running that on your fridge, it will always fit in your cache. Also while there are alternative allocation schemes, both Windows and Linux use linear allocation as far as I know, and C++ uses the OS directly by default.
I'm not convinced that allocations are the culprit. Program should generally keep reusing 6 memory areas without requisting more from OS after first loop iteration. Sure, there is overhead to mark them as used and mark them as free, but my intuition says cache as well.
Again man, those are 6 integers. You're free to place your faith in cache issues, but the math suggests that's not your problem.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.