0

One often reads that there is little performance difference between dynamically allocated array and std::vector.

Here are two versions of the problem 10 of project Euler test with two versions:

with std::vector:

const __int64 sum_of_primes_below_vectorversion(int max) 
{
        auto primes = new_primes_vector(max);
        __int64 sum = 0;
        for (auto p : primes) {
                sum += p;
        }
        return sum;
}

const std::vector<int> new_primes_vector(__int32 max_prime)
{
        std::vector<bool> is_prime(max_prime, true);
        is_prime[0] = is_prime[1] = false;
        for (auto i = 2; i < max_prime; i++) {
                is_prime[i] = true;
        }
        for (auto i = 1; i < max_prime; i++) {
                if (is_prime[i]) {
                        auto max_j = max_prime / i;
                        for (auto j = i; j < max_j; j++) {
                                is_prime[j * i] = false;
                        }
                }
        }
        auto primes_count = 0;
        for (auto i = 0; i < max_prime; i++) {
                if (is_prime[i]) {
                        primes_count++;
                }
        }

        std::vector<int> primes(primes_count, 0);
        for (auto i = 0; i < max_prime; i++) {
                if (is_prime[i]) {
                        primes.push_back(i);
                }
        }
        return primes;
}

Note that I also tested the version version with the call to the default constructor of std::vector and without the precomputation of its final size.

Here is the array version:

const __int64 sum_of_primes_below_carrayversion(int max)
{
        auto p_length = (int*)malloc(sizeof(int));
        auto primes = new_primes_array(max, p_length);
        auto last_index = *p_length - 1;

        __int64 sum = 0;
        for (int i = 0; i < last_index; i++) {
                sum += primes[i];
        }

        free((__int32*)(primes));
        free(p_length);
        return sum;
}


const __int32* new_primes_array(__int32 max_prime, int* p_primes_count)
{
        auto is_prime = (bool*)malloc(max_prime * sizeof(bool));
        is_prime[0] = false;
        is_prime[1] = false;
        for (auto i = 2; i < max_prime; i++) {
                is_prime[i] = true;
        }
        for (auto i = 1; i < max_prime; i++) {
                if (is_prime[i]) {
                        auto max_j = max_prime / i;
                        for (auto j = i; j < max_j; j++) {
                                is_prime[j * i] = false;
                        }
                }
        }

        auto primes_count = 0;
        for (auto i = 0; i < max_prime; i++) {
                if (is_prime[i]) {
                        primes_count++;
                }
        }
        *p_primes_count = primes_count;

        int* primes = (int*)malloc(*p_primes_count * sizeof(__int32));
        int index_primes = 0;
        for (auto i = 0; i < max_prime; i++) {
                if (is_prime[i]) {
                        primes[index_primes] = i;
                        index_primes++;
                }
        }
        free(is_prime);
        return primes;
}

This is compiled with the MVS2013 compiler, with optimization flags O2. I don't really see what should be the big difference, because of the move semantics (allowing returning the big vector by value without copy).

Here are the results, with an input of 2E6:

C array version
avg=    0.0438
std=    0.00928224
vector version
avg=    0.0625
std=    0.0005
vector version (no realloc)
avg=    0.0687
std=    0.00781089

The statistics are on 10 trials. I think there are quite some differences here. Is it because something in my code to be improved?

edit: after correction of my code (and another improvement), here are my new results:

C array version
avg=    0.0344
std=    0.00631189
vector version
avg=    0.0343
std=    0.00611637
vector version (no realloc)
avg=    0.0469
std=    0.00997447

which confirms that there is no penalty of std::vector compare to C arrays (and that one should avoid reallocating).

2
  • 1
    Change to not return const, that may inhibit optimization Commented Nov 23, 2014 at 22:44
  • @MattMcNabb it does indeed improve the performance, but only for the array version (to around 0.032). Commented Nov 23, 2014 at 23:03

1 Answer 1

6

There shouldn't be a performance difference between vector and a dynamic array, since a vector is a dynamic array.

The performance difference in your code comes from the fact that you are actually doing different things between the vector and array version. For instance:

    std::vector<int> primes(primes_count, 0);
    for (auto i = 0; i < max_prime; i++) {
            if (is_prime[i]) {
                    primes.push_back(i);
            }
    }
    return primes;

This creates a vector of size primes_count, all initialized to 0, and then pushes back a bunch of primes onto it. But it still starts with primes_count 0s! So that's wasted memory from both an initialization perspective and an iteration perspective. What you want to do is:

std::vector<int> primes;
primes.reserve(primes_count);
// same push_back loop
return primes;

Along the same lines, this block;

    std::vector<int> is_prime(max_prime, true);
    is_prime[0] = is_prime[1] = false;
    for (auto i = 2; i < max_prime; i++) {
            is_prime[i] = true;
    }

You construct a vector of max_prime ints initialized to true... and then assign most of them to true again. You're doing the initialization twice here, whereas in the array implementation you only do it once. You should just remove this for loop.

I bet if you fix these two issues - which would make the two algorithms comparable - you'd get the same performance.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.