59

With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array. E.g.,

vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);

But now the vector doesn't know that it contains M bytes of data, added externally by read(). I know that vector::resize() sets the size, but it also clears the data, so it can't be used to update the size after the read() call.

Is there a trivial way to read data directly into vectors and update the size after? Yes, I know of the obvious workarounds like using a small array as a temporary read buffer, and using vector::insert() to append that to the end of the vector:

char tmp[N];
int M = read(fd, tmp, N);
buf.insert(buf.end(), tmp, tmp + M)

This works (and it's what I'm doing today), but it just bothers me that there is an extra copy operation there that would not be required if I could put the data directly into the vector.

So, is there a simple way to modify the vector size when data has been added externally?

7
  • 2
    Are you sure &buf[0] works in debug mode? For instance, on Visual Studio, in debug mode std::vector::operator[] performs a range check. So that expression will throw if buf is empty. Commented Oct 7, 2011 at 15:41
  • I use GCC, and I ran the program through valgrind to make sure that no memory errors occured. All I can say is that with the GNU libstdc++ implementation, this works. &vec[0] seems to give you a direct pointer to reserved memory, no matter the size(). Commented Oct 7, 2011 at 16:11
  • 1
    @user984228: if you're happy to rely on implementation details of GCC (which is a BAD IDEA (TM)), then you'd look at the source for its implementation of vector. You can see where it stores the begin and end pointers and capacity, and if you just overwrite the end pointer, I'm pretty sure that will change the size as you want. Just copy whatever the implementation of resize() does in the case where the capacity is big enough to start with, leaving out the memset/fill/whatever. You'll have to work around some private modifiers, of course, perhaps by hard-coding in the offsets. Commented Oct 7, 2011 at 16:24
  • 8
    @SteveJessop: I just died a little. Commented Oct 7, 2011 at 16:34
  • @Matthieu: quite. If all that stuff sounds like a bad idea, then hopefully relying on the fact that GCC appears to let you write into space that's only reserved, not resized, also sounds like a bad idea :-) Commented Oct 7, 2011 at 17:49

5 Answers 5

29
vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);

This code fragment invokes undefined behavior. You can't write beyond than size() elements, even if you have reserved the space.

The correct code is like:

vector<char> buf;
buf.resize(N);
int M = read(fd, &buf[0], N);
buf.resize(M);


PS. Your statement "With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array" isn't true. The allowable range is [&vec[0], &vec[vec.size()).

Sign up to request clarification or add additional context in comments.

21 Comments

There is no way to avoid the unnecessary initialization that the first resize() causes?
99% certainty the extra initialization will be dwarfed by the cost of your I/O anyway.
@user984228: The question is whether that is a problem. If you have measured and that initialization becomes a bottleneck (I would not expect that) then you might need to consider implementing your own data structure... Note: if and only if, I am not trying to have you implement your own data type, but rather realize that in most cases that will not be a performance bottleneck --i.e. wherever you are reading from is probably much slower than the cost of that initialization.
@MarkB: would not you expect a good implementation of insert (range) to be specialized with a single reserve call for random iterators ?
@user984228: Then I'd rather just use the temporary buffer + insert. It should be at least as efficient, Incorrect. The temporary buffer avoids the zero initialization, reads into buffer, then requires a copy from buffer to vector. Vector resize has a zero initialization, then reads into vectors. Zero initialization is at least as fast, probably faster, than a copy. Ergo, resize is still faster than a buffer.
|
15

Another, newer, question, a duplicate of this one, has an answer, which looks like exactly what is asked here. Here's its copy (of v3) for quick reference:

It is a known issue that initialization can not be turned off even explicitly for std::vector.

People normally implement their own pod_vector<> that does not do any initialization of the elements.

Another way is to create a type which is layout-compatible with char, whose constructor does nothing:

struct NoInitChar
{
    char value;
    NoInitChar() {
        // do nothing
        static_assert(sizeof *this == sizeof value, "invalid size");
        static_assert(__alignof *this == __alignof value, "invalid alignment");
    }
};

int main() {
    std::vector<NoInitChar> v;
    v.resize(10); // calls NoInitChar() which does not initialize

    // Look ma, no reinterpret_cast<>!
    char* beg = &v.front().value;
    char* end = beg + v.size();
}

Comments

10

It looks like you can do what you want in C++11 (though I haven't tried this myself). You'll have to define a custom allocator for the vector, then use emplace_back().

First, define

struct do_not_initialize_tag {};

Then define your allocator with this member function:

class my_allocator {
    void construct(char* c, do_not_initialize_tag) const {
        // do nothing
    }

    // details omitted
    // ...
}

Now you can add elements to your array without initializing them:

std::vector<char, my_allocator> buf;
buf.reserve(N);
for (int i = 0; i != N; ++i)
    buf.emplace_back(do_not_initialize_tag());
int M = read(fd, buf.data(), N);
buf.resize(M);

The efficiency of this depends on the compiler's optimizer. For instance, the loop may increment the size member variable N times.

2 Comments

You cannot 'emplace_back' anything other than 'char's to your 'std::vector' of chars
@Gils emplace_back() forwards its arguments to the custom allocator's construct method if it has one, so with his custom allocator it will in fact accept do_not_initialize_tag() as argument.
2

Your program fragment has entered the realm of undefined behavior.

when buf.empty() is true, buf[0] has undefined behavior, and therefore &buf[0] is also undefined.

This fragment probably does what you want.

vector<char> buf;
buf.resize(N); // preallocate space
int M = read(fd, &buf[0], N);
buf.resize(M); // disallow access to the remainder

Comments

2

Writing into and after the size()th element is an undefined behavior.

Next example copies whole file into a vector in a c++ way (no need to know the file's size and no need to preallocate the memory in the vector):

#include <algorithm>
#include <fstream>
#include <iterator>
#include <vector>

int main()
{
    typedef std::istream_iterator<char> istream_iterator;
    std::ifstream file("example.txt");
    std::vector<char> input;

    file >> std::noskipws;
    std::copy( istream_iterator(file), 
               istream_iterator(),
               std::back_inserter(input));
}

1 Comment

of course you can also call reserve beforehand with the file size to avoid all the reallocations.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.