C++, weird behavior while reading binary ifstream

Question

For my first question here, I'd like to talk about reading binary files in C++; I'm recoding an ID3 tag library.

I'm parsing the header which is a binary file, the first 10bytes are as follow:

ID3    = 3 bytes = constant identifier
0xXXXX = 2 bytes = version (MSB: major version, LSB: minor. eg: 0x0301 = v3.1)
0xXX   = 1 byte  = some flags
4*0xXX = 4 bytes = size

here's the piece of code to process that :

char          id[4];
uint16_t      version;
uint8_t       flags;
uint32_t      size;
std::ifstream _stream;

_stream = std::ifstream(_filename, std::fstream::binary);

_stream.read(id, 3);
id[3] = 0;
// process id
_stream.read((char *)&version, 2);
// process version
_stream.read((char *)&flags, 1);
// process flags
_stream.read((char* )&size, 4);
// process flags
_stream.close();

everything works fine except for version. lets say it's v3.0 (0x0300), the value set in version is 0x03, I would understand this behavior in text mode as it would consider 0x00 as end of string but here I'm reading in binary. And use numeric formats.

Other strange thing, if I process it in 2 times I can make it work, eg :

uint16_t version = 0;
char     buff;

 _stream.read(&buff, 1);
version = (buff << 8);
 _stream.read(&buff, 1);
version |= buff;

In this case the value of version is 0x0300.

Do you have any idea why the first method doesn't work properly? Am I doing something wrong ?

Anyways, thanks for your help,

Cheers !

Here's some google food for you: "little endian" and "big endian". — Sam Varshavchik
– Sam Varshavchik, Commented Oct 3, 2017 at 11:06
You first need to define precisely your file format (perhaps in EBNF notation) — Basile Starynkevitch
– Basile Starynkevitch, Commented Oct 3, 2017 at 11:06
As an aside, if you're looking for platform independent code, then there's no guarantee that a byte is 8 bits (those same platforms where that may be the case probably also wouldn't support fixed width integer types either) — AndyG
– AndyG, Commented Oct 3, 2017 at 11:50
if you are using Qt I recommend using QDataStream which handles endian issue for free. — Marek R
– Marek R, Commented Oct 3, 2017 at 11:53
@SamVarshavchik you're right, I jumped to the conclusion it was a weird bahoviour but I forgot the classes I had in school, thanks for the hint. — Arnaud M
– Arnaud M, Commented Oct 3, 2017 at 19:04

ypnos · Accepted Answer · 2017-10-03 14:34:06Z

4

The version field consists not of an unsigned short but of two unsigned bytes (major version, minor version). You should read the two version numbers separately to not getting mangled up in endianess problems.

Endianess is platform-specific. If you insist on reading a single short that combines major and minor version, you could work around it. But in the end you write less clean and understandable code to solve a problem you created yourself.

edited Oct 3, 2017 at 14:34

answered Oct 3, 2017 at 11:15

ypnos

53k14 gold badges104 silver badges151 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ypnos Over a year ago

@HWalters There is tools to convert numbers in a bytestream of a given endianess to the local platform, e.g. ntohs() and alike. So you could get a short that contains both major and minor version in an platform-independent fashion. It is just not worth it in this example, as compared to simply reading both numbers independently.

Arnaud M Over a year ago

@ypnos, actually you're right, i've ended up reading byte by byte, it's way simpler and easier to read. but the way it is written in the specs, I didn't understand it was two separated bytes, I thought I was one.

Daniel Trugman · Accepted Answer · 2017-10-03 12:35:54Z

1

This seems like an endianess issue. So what is it? According to Wikipedia:

Endianness refers to the sequential order in which bytes are arranged into larger numerical values, when stored in computer memory or secondary storage

Visual example of a layout in memory:

Image origin

When you read the value as a one-shot, the bytes get re-arranged, probably because of an inconsistency between the way they were written and the way they are read.

Since you know the order in which they lay in memory, you should do one of the following:

Read byte by byte.
Read the value and swap the bytes using _byteswap_ushort in VC++ or __builtin_bswap16 for GCC
Read the value and swap the bytes using a custom implementation

edited Oct 3, 2017 at 12:35

answered Oct 3, 2017 at 11:50

Daniel Trugman

8,52424 silver badges43 bronze badges

3 Comments

underscore_d Over a year ago

Don't do #2. There's absolutely no need to use a vendor-specific extension here and make your code non-portable.

Daniel Trugman Over a year ago

@underscore_d, added a reference to custom swap implementations

Arnaud M Over a year ago

@DanielTrugman It reminds me classes back in the day at school, how did I miss that... I would have slapped myself for that mistake 5 years ago. I think I've spent too much time on high level languages. Anyways, thanks for the answer, I ended up reading it byte by byte

Collectives™ on Stack Overflow

C++, weird behavior while reading binary ifstream

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related