0

I need to read a binary file written on little-endian OS. An extraction operator<< does not work on binary files. It seems that a simpleminded implementation along the lines of code below works on Mac OS X running on Intel chips. I just wonder how kosher is it. Would I just need to swap bytes on big-endian machines?

    #include <istream>
    #include <cstdint>
    ...
    std::stream sfile(path, std::ios::binary);
    ...
    uint32_t iValue;
    sfile.read(reinterpret_cast<char *>(&iValue), sizeof(uint32_t));
    double dValue;
    sfile.read(reinterpret_cast<char *>(&dValue), sizeof(double)); 
2
  • To make it machine architecture independent you can use the methods from the hton<x>/ntoh<x> family to convert to/from big-endian forth and back. Commented May 26, 2014 at 14:04
  • 4
    The byte order fallacy Commented May 26, 2014 at 14:26

1 Answer 1

3

Would I just need to swap bytes on big-endian machines?

The machine doesn't matter. C++ integers are numbers, not sequences of bytes. Sequences of bytes, unsurprisingly, have the byte order (aka endianness) property. Numbers don't. Five is five is five is 5 is V is IIIII is 101 is 12.

You want to obtain a number from its representation as a sequence of bytes with the little-endian byte order. C++ has a simple way to do that:

i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);

This works on any machine because C++ integers are numbers on any machine.

For floating point numbers, you need to know how they were encoded. The byte order property is not enough to describe that. In most mainstream implementations you can assume that they are encoded as specified in the IEEE754 standard. To read one in those implementations, you can construct an integer from the appropriate byte order and then bitwise copy it into a floating point variable, as follows:

uint32_t i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
float f; // assumes IEEE754 single-precision
std::memcpy(&f, &i, sizeof(i));
Sign up to request clarification or add additional context in comments.

5 Comments

Every C++ object is a sequence of bytes. This includes integers. Integer values are of course just numbers.
I hate this, but I wasn't the one bringing this silly pedantry in. C++ defines an object as a region of storage, not as a sequence of bytes. Related? Yes. The same? No. Objects usually occupy some bytes of storage (not all), but those bytes don't even have to be in a sequence, as C++ only requires some specific objects to occupy non-contiguous regions of storage. A map is not the territory it describes. (This answer doesn't even mention the word "object")
I started by using bit-shifting operators just as you wrote but found that direct read into memory also works in my environment. Thanks for pointing out that the formula for integers will work on big-endian machines as well. However the question about floating points remains open.
Oh, and obviously for a double as in your example you'll need a 64-integer, but the procedure is the same.
Another fine point is that stream.read method requires (char *) pointer. However the formula for making integer out of data bytes works for unsigned chars. Therefore one needs to declare data as unsigned char array of 4 and use reinterpret cast to pass it to read method. It is probably worth adding it to the answer for completeness.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.