5

For one of my applications I need to generate vector of size 2^35 (the size of my RAM is 96 GB, so this vector can easily fit into RAM).

int main ()
{
  int i;

  /* initialize random seed: */
  srand (time(NULL));

  vector<int> vec;
  do {
     i = rand() % 10 + 1;
     vec.push_back(i);
  } while ((vec.size()*sizeof(int))<pow(2,35));

  return 0;
}

However, I notice that my do while loop executes infinitely. One of the possible reasons is range of vec.size() is long unsigned int, which is very less than the number of elements inserted i.e. pow(2,35), due to which I think it goes in an infinite loop. I may be wrong. Please correct me if I am wrong. But can someone please tell how can I insert greater than pow(2,35) numbers in vec.

gcc version:4.8.2

26
  • 7
    Better to reserve space in that case. Commented Apr 23, 2015 at 11:39
  • 2
    vector is guaranteed to be continously allocated in memory, so when it's grown beyond the current bounds it has to be reallocated and this process is then repeated lots of times, the vector growing each time. Commented Apr 23, 2015 at 11:41
  • 2
    @StegVerner: vec.reserve(n). Commented Apr 23, 2015 at 11:42
  • 3
    @AaronMcDaid Value of sizeof(vec.size()) is 8 Commented Apr 23, 2015 at 11:52
  • 13
    "the vector can easily fit" - no it can't. Assuming 4 bytes per int, the final array will be 2^37 bytes, or 128GB. Then you multiply by sizeof(int) for some mysterious reason, so you're actually trying to allocate 512GB. By not reserving enough capacity first, you're might need twice as much memory due to fragmentation. So it should easily fit into 1TB, but not 96GB. Commented Apr 23, 2015 at 12:01

3 Answers 3

3

I'll try to address some of your problems in a simple solution:

First problem you have is space. Since you need numbers from 1-10 only, a int8_t would serve you much better.

Second is speed. std::vector does a lot of allocations and reallocations behind the hood. Since you have a fixed size, In my opinion there's no need to use it. Knowing this, we'll use a simple array and threads to improve performance.

Here's the code:

#include <array>
#include <random>
#include <thread>
#include <cstdint>
#include <memory>
#include <chrono>

// Since you only need numbers from 1-10, a single byte will work nicely.
const uint64_t size = UINT64_C(0x800000000); // Exactly 2^35
typedef std::array<int8_t, size> vec_t;

// start is first element, end is one-past the last. This is a template so we can generate multiple functions.
template<unsigned s>
void fill(vec_t::iterator start, vec_t::iterator end) {
    static const int seed = std::chrono::system_clock::now().time_since_epoch().count()*(s+1);
    static std::default_random_engine generator(seed);
    static std::uniform_int_distribution<int8_t> distribution(1,10);
    for(auto it = start; it != end; ++it) {
        *it = distribution(generator);  // generates number in the range 1..10
    }
}

int main() {
    auto vec = std::unique_ptr<vec_t>(new vec_t());

    // Each will have its own generator and distribution.
    std::thread a(fill<0>, vec->begin(), vec->begin() + size/4);
    std::thread b(fill<1>, vec->begin() + size/4, vec->begin() + size/2);
    std::thread c(fill<2>, vec->begin() + size/2, vec->begin() + (size/4)*3);
    std::thread d(fill<3>, vec->begin() + (size/4)*3, vec->end());
    a.join();
    b.join();
    c.join();
    d.join();
    return 0;
}
Sign up to request clarification or add additional context in comments.

10 Comments

int8_t vec[size]; This won't work (on most systems). You usually cannot get more than 8 MB this way. Using std::vector is the right thing to do, if you reserve the space before you start filling it, the number of allocations is exactly one (which is smaller than "a lot").
using a heap allocated std::array (+scoped_ptr) would be best imho. reserve does not work that well with threads.. as push_back is not thread safe.
Prefer std::array to c-style array.
No array is needed or appropriate here, neither the arr[] nor the std::array nor the new[] style. std::vector itself is not the problem here.
I edited the code to use the heap. If you allocate a std:array on the stack, its elements will be allocated on the stack aswell. So now the code allocates the std::array on the heap instead.
|
2

Why can't you use constructor?

std::vector<int> vec ( number_of_elements );

That way you'll have memory reserved, then you can randomize elements using generate or something.

5 Comments

I believe it's faster to use vec.reserve and then push_back/emplace_back. As you're going to have to modify every existing value anyway (loop through the vector all over again), since the ctor default constructs the type (unless you pass it a value as the second arg, but this doesn't make a difference).
@miguel.martin How can I find out whether reserve() has allocated the amount of memory it have asked it to allocate i.e. 34 GB
How can I find out whether the constructor has allocated the amount of memory it have asked it to allocate i.e. 34 GB
@StegVerner I pretty sure it would throw exception, if the constructor from my answer fails. You can call 'capacity' to get the number of elements that vector can store without reallocation.
One thing to keep in mind with std::vector::reserve is that it might try to allocate more memory then you told it to. If you try to reserve e.g. 34GiB and have 40GiB memory available, it could still fail, because under the hood std::vector might try to allocate 48GiB of memory. Therefore std::vector is not the best tool if you are limited by memory.
1

Update

As Baum mit Augen has highlighted, this post doesn't really answer the question because in his platform condition 4 doesn't hold (sizeof(std::size_t) is actually 8). However, I leave this post here to highlight an issue that might occur when porting the code.

Original post

One problem that I see is the following. Let's assume (most platforms fulfill these assumptions) that

1) vec.size returns std::size_t (not guaranteed);

2) sizeof returns std::size_t (guaranteed);

3) std::size_t is an unsigned integer type (guaranteed);

4) sizeof(std::size_t) == 4 (not guaranteed);

5) CHAR_BIT == 8 (not guaranteed).

(Recall that CHAR_BIT is the number of bits in a char.)

Therefore, the type of vec.size()*sizeof(int) is std::size_t and its maximum value is 2^(sizeof(std::size_t)*CHAR_BIT) - 1 == 2^32 - 1 < 2^32 < 2^35. Therefore, vec.size()*sizeof(int) is always smaller than 2^35.

2 Comments

stackoverflow.com/questions/29822249/… The size of the integer type that vector::size returns is 8 on his system.
@BaummitAugen Fair enough, I haven't read all the comments (it should probably be made clear be in the post). In any case, I believe my answer is still relevant because it highlights issues one may encounter when trying to write portable code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.