18

I weren't able to find that question, and it's an actual problem I'm facing.

I have a file loading utility that returns std::vector<unsigned char> containing whole file contents. However, the processing function requires contiguos array of char (and that cannot be changed - it's a library function). Since the class that's using the processing function stores a copy of the data anyway, I want to store it as vector<char>. Here's the code that might be a bit more illustrative.

std::vector<unsigned char> LoadFile (std::string const& path);

class Processor {
    std::vector<char> cache;
    void _dataOperation(std::vector<char> const& data);

public:
    void Process() {
        if (cache.empty())
            // here's the problem!
            cache = LoadFile("file.txt");

        _dataOperation(cache);
    }
};

This code doesn't compile, because (obviously) there's no appropriate conversion. We can be sure, however, that the temporary vector will ocupy the same amount of memory (IOW sizeof(char) == sizeof(unsigned char))

The naive solution would be to iterate over the contents of a temporary and cast every character. I know that in normal case, the operator= (T&&) would be called.

In my situation it's safe to do reinterpreting conversion, because I am sure I am going to read ASCII characters only. Any other character would be caught in _dataOperation anyway.

So, my question is : how to properly and safely convert the temporary vector in a way that involves no copying?

If it isn't possible, I would prefer the safe way of copying rather than unsafe noncopying. I could also change LoadFile to return either vector<char> or vector<unsigned char>.

21
  • 1
    If you control the code of _dataOperation, you will probably be happier in the long run if you make it take vector<unsigned char>. Commented Feb 6, 2013 at 0:39
  • @Zack unfortunately, I don't. It's a library function. I'll edit the question. Commented Feb 6, 2013 at 0:41
  • 1
    @BartekBanachewicz: Then I guess the template version is a good idea. If you're sure the content of the file will not have bytes > 127 (unlikely for simple text), then you should be fine instantiating it for unsigned char. Commented Feb 6, 2013 at 0:50
  • 2
    It should be fine to cast that type: reinterpret_cast<char *>(unsigned_vector.data()) etc. Commented Feb 6, 2013 at 0:54
  • 1
    @Bartek: Yes, but when you construct the temporary with the two iterators, it will copy the content, not move it... Am I mistaken? Commented Feb 6, 2013 at 1:08

1 Answer 1

8

In C++11, [basic.lval]p10 says,

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • ...
  • a char or unsigned char type.

(the exact location may be different in other versions of C++, but the meaning is the same.)

That means that you can take a vector<unsigned char> cache and access its contents using the range [reinterpret_cast<char*>(cache.data()), reinterpret_cast<char*>(cache.data()) + cache.size()). (@Kerrek SB mentioned this.)

If you store a vector<unsigned char> in Processor to match the return type of LoadFile, and _dataOperation() actually takes an array of char (meaning a const char* and a size), then you can cast when you're passing the argument to _dataOperation()

However, if _dataOperation() takes a vector<char> specifically and you store a vector<unsigned char> cache, then you cannot pass it reinterpret_cast<vector<char>&>(cache). (i.e. @André Puel is totally wrong. Do not listen to him.) That violates the aliasing rules, and the compiler will attempt to anger your customers at 2am. (And if this version of your compiler doesn't manage it, the next version will keep trying.)

One option is, as you mentioned, to template LoadFile() and have it return (or fill in) a vector of the type you want. Another is to copy the result, for which the concise version is again the reinterpret_cast of the source vector's .data(). [basic.fundamental]p1 mentions that "For character types, all bits of the object representation participate in the value representation.", meaning that you're not going to lose data with that reinterpret_cast. I don't see a firm guarantee that no bit pattern of an unsigned char can cause a trap if reinterpret_cast'ed to char, but I don't know of any modern hardware or compilers that do it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for a complete answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.