2

Let's suppose a very basic c++ program which allocates a huge number of small std::vector's. I don't really know how the compiler and the OS will place those vectors in process memory space but if the number is big, I think some vectors could be close by (near).

Now, let's suppose I delete some vectors in memory and I keep a few others. Imagine I want to add 10000 items to the first vector.

What will happen if the second vector is too close in memory? Do you think i will get a "low memory" error, or should the OS move the first vector?

3
  • 3
    If I understood what you are asking -- vectors aren't necessarily placed sequentially in memory. You will never run out of space for allocation unless there is physically no space anywhere to put it. Commented Aug 7, 2017 at 19:04
  • 2
    The mechanics of such an operation would work like this: If you substantially increased the SIZE of the first vector to the point that it's CAPACITY had to be increased then the process would move the vector's contents to a temporary vector<T> array. Then a new vector of the new CAPACITY would be created in a place with enough contiguous memory to accommodate it. Then the contents of the temporary array would be moved to the new larger vector and the temporary array would be destroyed. You will never have two vectors "bumping into" each other because they got too "close". Commented Aug 7, 2017 at 20:12
  • When you say "too close in memory", do you mean in physical RAM? Or do you mean in the virtual address space of the process? There may be a few confusions underlying your question. Commented Aug 7, 2017 at 21:05

2 Answers 2

3

No, it does not matter if vectors are close to each other. Only if a vector reaches a size where no contiguous block of memory is left to hold its memory, you will get an error (for the default allocator, an std::bad_alloc exception will be thrown).

What happens internally is similar to what you probably mean with moving, but in C++11 that term has a different meaning, so I will try to avoid that and would rather call it reallocated. Also note that the operating system has nothing to do with it.

Let us look at the details:

It is correct that an std::vector is contiguous, but (in contrast to an std::array) its elements are not directly stored inside the std::vector instance itself. Instead, it stores the underlying array on the heap and only keeps a pointer to it.

For efficiency reasons, implementations are allowed to make its internal array bigger than the number of elements stored in the array. For example:

std::vector<int> v;
assert(v.size() == 0);
assert(v.capacity() >= 0); // at least v.size(), but can be higher

When you add new elements to the vector (e.g., via v.push_back), one the following two things will happen:

  • If there is enough space left (i.e., v.size() < v.capacity()), the new element can be added without any extra memory allocation
  • Otherwise, the underlying array has to be increased, which involves the following steps:

    1. A new a new (larger) array will be allocated.
    2. All elements from the old array has to be copied to the new array.
    3. The new array replaces the old array (which will be deallocated) and you can insert the new element.

It is important to note that the std::vector instance itself will stay at the same memory address, only its internal pointer will now point to the newly created (larger) array. In that respect, the data has been moved to a different memory location. (That also has consequences, for instance, all iterations that you kept to the elements are now invalidated.)

The critical operation is the reallocation of the memory. Here, memory fragmentation comes into play. It can happen that because of fragmentation, there is not possible to allocate the new array even if there would be enough spaces without fragmentation.

In contrast to other languages, it is not possible in C++ to avoid the fragmentation in the way a compacting garbage collector will do (e.g., some GC implementations in Java are compacting). In the same way, the operating system cannot help to avoid memory fragmentation in C++. A least in theory. In practice, in today's 64-bit systems (with virtual memory), memory fragmentation is less of a concern that it used to be.

If you do not need the property that the elements in your container have to be contiguous, you can use std::dequeue instead of std::vector. It is more robust against memory fragmentation because it will not keep one big array but several smaller blocks. On the other hand, std::vector is typically more efficient, so I would by default still use the vector, but here is an old article from Herb Sutter that touches the topic: Using Vector and Deque

Sign up to request clarification or add additional context in comments.

Comments

2

When your std::vector runs out of capacity, it'll reallocate space (usually 2 * required_size, see amortized complexity) and move elements already in the vector. It will move the data pointer inside the first vector, it won't move the vector itself (your vector and your vector data is in different locations).

Your std::vector and the elements "inside" it are normally not in the same spot. This incomplete pseudo-implementation is wrong for a number of reasons but might illustrate how push_back scales internally:

namespace std {

template<typename T>
class vector<T>
  size_t size_;
  size_t capacity_;
  T* data_;  // Stored elsewhere on the heap.
  void push_back(const T& foo) {
    if (size_ == capacity_) {
      capacity_ *= 2;  // assuming capacity_ > 0, and non-wrapping size
      data_ = realloc(data_, capacity_ * sizeof(T));  // assumes POD types and no realloc failures.
    }
    data_[++size_] = foo;
  }
}
}

realloc here will move the data inside the vector, so any old references to &vector[0] are garbage after push_back reallocates the vector. realloc takes care of finding a continuous segment that's large enough to store N new elements (might have to mmap more memory).

Another example that explains the separation:

int main() {
  std::vector<float> numbers;  // the vector is on the stack and never moves.

  numbers.push_back(5.0f);
  // 5.0f is stored inside vector data, which may be on the heap. 
  // Adding more items may allocate heap memory and move all previous items.

  return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.