Resizing a C++ std::vector<char> without initializing data [duplicate]

Question

With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array. E.g.,

vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);

But now the vector doesn't know that it contains M bytes of data, added externally by read(). I know that vector::resize() sets the size, but it also clears the data, so it can't be used to update the size after the read() call.

Is there a trivial way to read data directly into vectors and update the size after? Yes, I know of the obvious workarounds like using a small array as a temporary read buffer, and using vector::insert() to append that to the end of the vector:

char tmp[N];
int M = read(fd, tmp, N);
buf.insert(buf.end(), tmp, tmp + M)

This works (and it's what I'm doing today), but it just bothers me that there is an extra copy operation there that would not be required if I could put the data directly into the vector.

So, is there a simple way to modify the vector size when data has been added externally?

Are you sure &buf[0] works in debug mode? For instance, on Visual Studio, in debug mode std::vector::operator[] performs a range check. So that expression will throw if buf is empty. — Praetorian
– Praetorian, Commented Oct 7, 2011 at 15:41
I use GCC, and I ran the program through valgrind to make sure that no memory errors occured. All I can say is that with the GNU libstdc++ implementation, this works. &vec[0] seems to give you a direct pointer to reserved memory, no matter the size(). — user984228
– user984228, Commented Oct 7, 2011 at 16:11
@user984228: if you're happy to rely on implementation details of GCC (which is a BAD IDEA (TM)), then you'd look at the source for its implementation of vector. You can see where it stores the begin and end pointers and capacity, and if you just overwrite the end pointer, I'm pretty sure that will change the size as you want. Just copy whatever the implementation of resize() does in the case where the capacity is big enough to start with, leaving out the memset/fill/whatever. You'll have to work around some private modifiers, of course, perhaps by hard-coding in the offsets. — Steve Jessop
– Steve Jessop, Commented Oct 7, 2011 at 16:24
@Matthieu: quite. If all that stuff sounds like a bad idea, then hopefully relying on the fact that GCC appears to let you write into space that's only reserved, not resized, also sounds like a bad idea :-) — Steve Jessop
– Steve Jessop, Commented Oct 7, 2011 at 17:49

Robᵩ · Accepted Answer · 2011-10-07 15:36:58Z

29

vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);

This code fragment invokes undefined behavior. You can't write beyond than size() elements, even if you have reserved the space.

The correct code is like:

vector<char> buf;
buf.resize(N);
int M = read(fd, &buf[0], N);
buf.resize(M);

PS. Your statement "With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array" isn't true. The allowable range is [&vec[0], &vec[vec.size()).

edited Oct 7, 2011 at 15:36

answered Oct 7, 2011 at 15:31

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Sign up to request clarification or add additional context in comments.

21 Comments

user984228 Over a year ago

There is no way to avoid the unnecessary initialization that the first resize() causes?

Mark B Over a year ago

99% certainty the extra initialization will be dwarfed by the cost of your I/O anyway.

David Rodríguez - dribeas Over a year ago

@user984228: The question is whether that is a problem. If you have measured and that initialization becomes a bottleneck (I would not expect that) then you might need to consider implementing your own data structure... Note: if and only if, I am not trying to have you implement your own data type, but rather realize that in most cases that will not be a performance bottleneck --i.e. wherever you are reading from is probably much slower than the cost of that initialization.

Matthieu M. Over a year ago

@MarkB: would not you expect a good implementation of insert (range) to be specialized with a single reserve call for random iterators ?

Mooing Duck Over a year ago

@user984228: Then I'd rather just use the temporary buffer + insert. It should be at least as efficient, Incorrect. The temporary buffer avoids the zero initialization, reads into buffer, then requires a copy from buffer to vector. Vector resize has a zero initialization, then reads into vectors. Zero initialization is at least as fast, probably faster, than a copy. Ergo, resize is still faster than a buffer.

|

2 revs · Accepted Answer · 2017-05-23 12:25:19Z

Another, newer, question, a duplicate of this one, has an answer, which looks like exactly what is asked here. Here's its copy (of v3) for quick reference:

It is a known issue that initialization can not be turned off even explicitly for std::vector.

People normally implement their own pod_vector<> that does not do any initialization of the elements.

Another way is to create a type which is layout-compatible with char, whose constructor does nothing:
struct NoInitChar
{
    char value;
    NoInitChar() {
        // do nothing
        static_assert(sizeof *this == sizeof value, "invalid size");
        static_assert(__alignof *this == __alignof value, "invalid alignment");
    }
};

int main() {
    std::vector<NoInitChar> v;
    v.resize(10); // calls NoInitChar() which does not initialize

    // Look ma, no reinterpret_cast<>!
    char* beg = &v.front().value;
    char* end = beg + v.size();
}

Derek Ledbetter · Accepted Answer · 2012-10-08 18:44:15Z

10

It looks like you can do what you want in C++11 (though I haven't tried this myself). You'll have to define a custom allocator for the vector, then use emplace_back().

First, define

struct do_not_initialize_tag {};

Then define your allocator with this member function:

class my_allocator {
    void construct(char* c, do_not_initialize_tag) const {
        // do nothing
    }

    // details omitted
    // ...
}

Now you can add elements to your array without initializing them:

std::vector<char, my_allocator> buf;
buf.reserve(N);
for (int i = 0; i != N; ++i)
    buf.emplace_back(do_not_initialize_tag());
int M = read(fd, buf.data(), N);
buf.resize(M);

The efficiency of this depends on the compiler's optimizer. For instance, the loop may increment the size member variable N times.

answered Oct 8, 2012 at 18:44

Derek Ledbetter

4,9054 gold badges23 silver badges18 bronze badges

2 Comments

Gils Over a year ago

You cannot 'emplace_back' anything other than 'char's to your 'std::vector' of chars

Matthijs Over a year ago

@Gils emplace_back() forwards its arguments to the custom allocator's construct method if it has one, so with his custom allocator it will in fact accept do_not_initialize_tag() as argument.

Andy Finkenstadt · Accepted Answer · 2011-10-07 15:33:21Z

2

Your program fragment has entered the realm of undefined behavior.

when buf.empty() is true, buf[0] has undefined behavior, and therefore &buf[0] is also undefined.

This fragment probably does what you want.

vector<char> buf;
buf.resize(N); // preallocate space
int M = read(fd, &buf[0], N);
buf.resize(M); // disallow access to the remainder

answered Oct 7, 2011 at 15:33

Andy Finkenstadt

3,5871 gold badge23 silver badges25 bronze badges

Comments

BЈовић · Accepted Answer · 2011-10-07 16:30:42Z

2

Writing into and after the size()th element is an undefined behavior.

Next example copies whole file into a vector in a c++ way (no need to know the file's size and no need to preallocate the memory in the vector):

#include <algorithm>
#include <fstream>
#include <iterator>
#include <vector>

int main()
{
    typedef std::istream_iterator<char> istream_iterator;
    std::ifstream file("example.txt");
    std::vector<char> input;

    file >> std::noskipws;
    std::copy( istream_iterator(file), 
               istream_iterator(),
               std::back_inserter(input));
}

answered Oct 7, 2011 at 16:30

BЈовић

64.7k45 gold badges181 silver badges284 bronze badges

1 Comment

Matthieu M. Over a year ago

of course you can also call reserve beforehand with the file size to avoid all the reallocations.

Collectives™ on Stack Overflow

Resizing a C++ std::vector<char> without initializing data [duplicate]

5 Answers 5

21 Comments

Comments

2 Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

21 Comments

Comments

2 Comments

Comments

1 Comment

Linked

Related