In C++, why array is much faster than std::vector?

Question

This is related to LeetCode 552. Student Attendance Record II. The question is fairly simple.

The first code snippet relates to use array to initialize DP and DP_new. The code runs like 50 ms.

The second code snippet relates to use vector of vector to initialize DP and DP_new. The code runs more than 2000 ms.

The only difference between those two code snippets is that I use array or std::vector to initilize data structure "DP" and "DP_new".

Why is there a huge performance gap?

class Solution {
public:
    int checkRecord(int n) {
        int mod = pow(10, 9) + 7;
        int DP[2][3];                       //using array here
        memset(DP, 0, sizeof(DP));
        DP[1][0] = 1;
        DP[0][0] = 1;
        DP[0][1] = 1;
        for (int i = 1; i < n; ++i) {
            int DP_new[2][3];
            memset(DP_new, 0, sizeof(DP_new));            //using array here
            DP_new[1][2] = DP[1][1];
            DP_new[1][1] = DP[1][0];
            DP_new[1][0] = ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
            DP_new[0][2] = DP[0][1];
            DP_new[0][1] = DP[0][0];
            DP_new[0][0] = ((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod;       
            swap(DP, DP_new);
        }
        return ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
    }
};

class Solution {
public:
    int checkRecord(int n) {
        int mod = pow(10, 9) + 7;
        vector<vector<int>> DP(2, vector<int>(3, 0));       //use 2D vector here
        DP[1][0] = 1;
        DP[0][0] = 1;
        DP[0][1] = 1;
        for (int i = 1; i < n; ++i) {
            vector<vector<int>> DP_new(2, vector<int>(3, 0));     //use 2D vector here
            DP_new[1][2] = DP[1][1];
            DP_new[1][1] = DP[1][0];
            DP_new[1][0] = ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
            DP_new[0][2] = DP[0][1];
            DP_new[0][1] = DP[0][0];
            DP_new[0][0] = ((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod;       
            swap(DP, DP_new);
        }
        return ((((((DP[0][0] + DP[0][1]) % mod + DP[0][2]) % mod) + DP[1][0]) % mod + DP[1][1]) % mod + DP[1][2]) % mod;
    }
};

This is the LeetCode C++ compiler.

Your first example has two arrays, on the stack. Allocation cost is zero. Your second example makes 3(n+1) heap allocations and frees, and runs 3(n+1) constructors. Consider using a std::aarray here for a fairer comparison. — Tim Roberts
– Tim Roberts, Commented Apr 28, 2022 at 20:08
If your matrix size is larger than the processor's data cache size, you may notice slow downs due to the processor having to reload the data cache. — Thomas Matthews
– Thomas Matthews, Commented Apr 28, 2022 at 21:59
Not only allocation cost, but also the cost for [] operator many times — binhgreat
– binhgreat, Commented Jun 14, 2022 at 4:50

Blindy · Accepted Answer · 2022-04-28 20:03:20Z

2

Allocating memory on the heap is extremely slow, because it has to do a linear search for a chunk large enough to hold your data. In fact, the degenerate case of running out of memory means that you first checked every single chunk of physical and swap memory and came up empty.

Contrast that to a plain old array like in your first example, and the difference is easily explainable. It's all allocation speed.

Edit: I just saw your edit that you're using asan, that further exacerbates the problem by allocating very large chunks of memory around your every allocation.

answered Apr 28, 2022 at 20:03

Blindy

68k10 gold badges96 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Dai Over a year ago

"because it has to do a linear search for a chunk large enough to hold your data" - this isn't universally true: red-black-tree-based heaps have logarithmic time complexity for free-space lookup.

Galik Over a year ago

Also, multidimensional arrays are entirely contiguous memory making it very cache friendly whereas a vector of vectors is disjointed in memory leading to increased cache misses.

Blindy Over a year ago

I don't think it's a caching issue, he has 6 elements in his array. Unless you're running that on your fridge, it will always fit in your cache. Also while there are alternative allocation schemes, both Windows and Linux use linear allocation as far as I know, and C++ uses the OS directly by default.

Yksisarvinen Over a year ago

I'm not convinced that allocations are the culprit. Program should generally keep reusing 6 memory areas without requisting more from OS after first loop iteration. Sure, there is overhead to mark them as used and mark them as free, but my intuition says cache as well.

Blindy Over a year ago

Again man, those are 6 integers. You're free to place your faith in cache issues, but the math suggests that's not your problem.

|

Collectives™ on Stack Overflow

In C++, why array is much faster than std::vector?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related