A kosher way to read binary file in c++

Question

I need to read a binary file written on little-endian OS. An extraction operator<< does not work on binary files. It seems that a simpleminded implementation along the lines of code below works on Mac OS X running on Intel chips. I just wonder how kosher is it. Would I just need to swap bytes on big-endian machines?

    #include <istream>
    #include <cstdint>
    ...
    std::stream sfile(path, std::ios::binary);
    ...
    uint32_t iValue;
    sfile.read(reinterpret_cast<char *>(&iValue), sizeof(uint32_t));
    double dValue;
    sfile.read(reinterpret_cast<char *>(&dValue), sizeof(double));

To make it machine architecture independent you can use the methods from the hton<x>/ntoh<x> family to convert to/from big-endian forth and back. — πάντα ῥεῖ
– πάντα ῥεῖ, Commented May 26, 2014 at 14:04

R. Martinho Fernandes · Accepted Answer · 2014-05-26 15:12:46Z

3

Would I just need to swap bytes on big-endian machines?

The machine doesn't matter. C++ integers are numbers, not sequences of bytes. Sequences of bytes, unsurprisingly, have the byte order (aka endianness) property. Numbers don't. Five is five is five is 5 is V is IIIII is 101 is 12.

You want to obtain a number from its representation as a sequence of bytes with the little-endian byte order. C++ has a simple way to do that:

i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);

This works on any machine because C++ integers are numbers on any machine.

For floating point numbers, you need to know how they were encoded. The byte order property is not enough to describe that. In most mainstream implementations you can assume that they are encoded as specified in the IEEE754 standard. To read one in those implementations, you can construct an integer from the appropriate byte order and then bitwise copy it into a floating point variable, as follows:

uint32_t i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
float f; // assumes IEEE754 single-precision
std::memcpy(&f, &i, sizeof(i));

edited May 26, 2014 at 15:12

answered May 26, 2014 at 14:52

R. Martinho Fernandes

236k73 gold badges443 silver badges518 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

n. m. could be an AI Over a year ago

Every C++ object is a sequence of bytes. This includes integers. Integer values are of course just numbers.

R. Martinho Fernandes Over a year ago

I hate this, but I wasn't the one bringing this silly pedantry in. C++ defines an object as a region of storage, not as a sequence of bytes. Related? Yes. The same? No. Objects usually occupy some bytes of storage (not all), but those bytes don't even have to be in a sequence, as C++ only requires some specific objects to occupy non-contiguous regions of storage. A map is not the territory it describes. (This answer doesn't even mention the word "object")

LRaiz Over a year ago

I started by using bit-shifting operators just as you wrote but found that direct read into memory also works in my environment. Thanks for pointing out that the formula for integers will work on big-endian machines as well. However the question about floating points remains open.

R. Martinho Fernandes Over a year ago

Oh, and obviously for a double as in your example you'll need a 64-integer, but the procedure is the same.

LRaiz Over a year ago

Another fine point is that stream.read method requires (char *) pointer. However the formula for making integer out of data bytes works for unsigned chars. Therefore one needs to declare data as unsigned char array of 4 and use reinterpret cast to pass it to read method. It is probably worth adding it to the answer for completeness.

Collectives™ on Stack Overflow

A kosher way to read binary file in c++

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related