Parsing a binary message in C++. Any lib with examples?

Question

I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.

As an example: 9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440

where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.

So in this case, msg 26, I have to get this data from the 212 data bits:

4 bits integer value
4 bits integer value
A 9 bit float value from 0 to 63.875, where LSB is 0.125
4 bits integer value

EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.

And so on.

Do you know os any library which can easily achieve this?

I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.

EDIT: boost:dynamic_bitset looks promising. Any help using it?

You have 212 bits but you said that you need only 21 (4+4+9+4) bits. What meaning of other 191 bits? — Denis Ermolin
– Denis Ermolin, Commented Oct 22, 2012 at 7:57
you can have a look at: github.com/iso8859-1/BufferHandler. It's not complete yet but it should do what you want. — Tobias Langner
– Tobias Langner, Commented Oct 22, 2012 at 7:57
@DenisErmolin I said "And so on". If somebody helps me parsing those first values, I can parse the other 191 on my own. Anyway, if you want to know, basically 9 bit float and last 4 bit integer are repeated 14 more times — Roman Rdgz
– Roman Rdgz, Commented Oct 22, 2012 at 7:59

Community · Accepted Answer · 2017-05-23 12:18:31Z

6

If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.

Don't forget to pack the structure.

Example:

#pragma pack

include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
   unsigned int preamble:8;
   unsigned int msgid:6;
   unsigned data:212;
   unsigned crc:24;
#else
   unsigned crc:24;
   unsigned data:212;
   unsigned int msgid:6;
   unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;

You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::

//order32.h

#ifndef ORDER32_H
#define ORDER32_H

#include <limits.h>
#include <stdint.h>

#if CHAR_BIT != 8
#error "unsupported char size"
#endif

enum
{
    O32_LITTLE_ENDIAN = 0x03020100ul,
    O32_BIG_ENDIAN = 0x00010203ul,
    O32_PDP_ENDIAN = 0x01000302ul
};

static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
    { { 0, 1, 2, 3 } };

#define O32_HOST_ORDER (o32_host_order.value)

#endif

Thanks to code by Christoph @ here

Example program for using bitfields and their outputs:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;

struct bitfields{
  unsigned opcode:5;
  unsigned info:3;
}__attribute__((packed));

struct bitfields opcodes;

/* info: 3bits; opcode: 5bits;*/
/* 001 10001  => 0x31*/
/* 010 10010  => 0x52*/

void set_data(unsigned char data)
{
  memcpy(&opcodes,&data,sizeof(data));
}

void print_data()
{
  cout << opcodes.opcode << ' ' << opcodes.info << endl;
}

int main(int argc, char *argv[])
{
  set_data(0x31);
  print_data(); //must print 17 1 on my little-endian machine
  set_data(0x52); 
  print_data(); //must print 18 2
  cout << sizeof(opcodes); //must print 1
  return 0;
}

edited May 23, 2017 at 12:18

CommunityBot

11 silver badge

answered Oct 22, 2012 at 8:05

Aniket Inge

25.9k5 gold badges54 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

19 Comments

Aniket Inge Over a year ago

@RomanRdgz structures are aligned at word boundaries by default(which is 4 or 8 bytes). To avoid compilers to pad values(and therefore destroy the structure) we use #pragma pack(compiler dependent) (for it to work on gcc, you need to remove pragma pack and use __attribute__((packed))

Frerich Raabe Over a year ago

I believe this approach breaks as soon as you try to compile your code on a platform with a different byte-order.

Roman Rdgz Over a year ago

@PrototypeStark sorry but I don't understand the endian issue. When I was at college, I thought all the big/little endian stuff were about socket transmissions, so, If I already have a binary msg stored into a char buffer, I would think data is correctly ordered without having to check if system is big or little endian. Have I been wrong about that all this time, or just making this assumption because most architectures use little endian?

Aniket Inge Over a year ago

@RomanRdgz pretty much all archs these days are little endian. But never assume. simple test to determine endianness in C/C++: int isLittleEndian(){ char ch = (char)(0xFFEE); if(ch == 0xEE)return 1; else return 0; }

Aniket Inge Over a year ago

You're already avoiding the shifting and composing because you're setting the data straight into a buffer of length = sizeof of the structure. see the explanation and images explaining bitfields: href="msdn.microsoft.com/en-us/library/ewwyfdbe(v=vs.71).aspx"

|

Denis Ermolin · Accepted Answer · 2012-10-22 10:14:12Z

1

You can manipulate bits for your own, for example to parse 4 bit integer value do:

char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0; 
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
    value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}

Floats usually sent as string data:

std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);

If your data contains custom float format then just extract bits and do your math:

char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0; 
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
    if (i > 8) {
      ++readPos;
      i = 0;
    }
    const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
    //here your code
    ++i;
    --bits_to_read;
}

edited Oct 22, 2012 at 10:14

answered Oct 22, 2012 at 8:04

Denis Ermolin

5,5746 gold badges29 silver badges45 bronze badges

5 Comments

Aniket Inge Over a year ago

4bit_int << wrong variable name. Variable names cannot start with a number.

Aniket Inge Over a year ago

you cannot create integers of < 8 bits(=1byte). Hence bitfields.

Roman Rdgz Over a year ago

@DenisErmolin why using '_data' instead of just 'data'? Anyway, you are recommending static cast, but how do I get the 9 bit float? Because it is not coming as a string

Denis Ermolin Over a year ago

Do you how float value was packed into byte?

Roman Rdgz Over a year ago

@DenisErmolin LSB is 0.125, so if I have 0 0000 0001 it is 0.125. If 0 0000 0010 it is 0.250... if 1 000 0001 it would be (0.125)*2^8 + 0.125 = 32.125. At least that is what I understand when I read that LSB has that value

Vladimir Sinenko · Accepted Answer · 2012-10-22 08:29:17Z

0

Here is a good article that describes several solutions to the problem.

It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.

Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.

answered Oct 22, 2012 at 8:29

Vladimir Sinenko

4,6671 gold badge29 silver badges39 bronze badges

Collectives™ on Stack Overflow

Parsing a binary message in C++. Any lib with examples?

3 Answers 3

19 Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

19 Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related