28

I was attempting to read a binary file byte by byte using an ifstream. I've used istream methods like get() before to read entire chunks of a binary file at once without a problem. But my current task lends itself to going byte by byte and relying on the buffering in the io-system to make it efficient. The problem is that I seemed to reach the end of the file several bytes sooner than I should. So I wrote the following test program:

#include <iostream>
#include <fstream>

int main() {
    typedef unsigned char uint8;
    std::ifstream source("test.dat", std::ios_base::binary);
    while (source) {
        std::ios::pos_type before = source.tellg();
        uint8 x;
        source >> x;
        std::ios::pos_type after = source.tellg();
        std::cout << before << ' ' << static_cast<int>(x) << ' '
                  << after << std::endl;
    }
    return 0;
}

This dumps the contents of test.dat, one byte per line, showing the file position before and after.

Sure enough, if my file happens to have the two-byte sequence 0x0D-0x0A (which corresponds to carriage return and line feed), those bytes are skipped.

  • I've opened the stream in binary mode. Shouldn't that prevent it from interpreting line separators?
  • Do extraction operators always use text mode?
  • What's the right way to read byte by byte from a binary istream?

MSVC++ 2008 on Windows.

5 Answers 5

26

The >> extractors are for formatted input; they skip white space (by default). For single character unformatted input, you can use istream::get() (returns an int, either EOF if the read fails, or a value in the range [0,UCHAR_MAX]) or istream::get(char&) (puts the character read in the argument, returns something which converts to bool, true if the read succeeds, and false if it fails.

Sign up to request clarification or add additional context in comments.

5 Comments

Wow, it boggles my mind that I can't read a byte from a binary file without a cast of some sort.
That's because streams are designed for text (even when opened in binary mode). Generally, when reading real binary date, I'll use the system level routines (open/read/write/close under Unix), rather than bother with iostream.
One can still use std::skipws so that streams skip white space(and other formatting) even when used with stream operators
@Ghita I think you mean std::noskipws.
Right sorry you don't want to skip spaces in that case
6

Why are you using formatted extraction, rather than .read()?

4 Comments

Because source >> x; is easier to read than source.read(reinterpret_cast<char *>(&x));, and I didn't expect the extraction operator for a single byte on a binary file to do any formatting.
The .get() may be more efficient than .read() for single bytes. But the implementation may have .read call .get anyway or vice versa.
source.read((char*)&x) is shorter and C-style cast means the same as reinterpret cast in this case.
@cubuspl42: Best to avoid getting into the habit of using any C-style casts, though
6

there is a read() member function in which you can specify the number of bytes.

Comments

4
source.get()

will give you a single byte. It is unformatted input function. operator>> is formatted input function that may imply skipping whitespace characters.

Comments

2

As others mentioned, you should use istream::read(). But, if you must use formatted extraction, consider std::noskipws.

2 Comments

No, I actually meant noskipws.
I meant that you can still use formatedd extraction (using stream operators) just that one in that case has to specify skipws

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.