4

I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.

As an example: 9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440

where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.

So in this case, msg 26, I have to get this data from the 212 data bits:

  • 4 bits integer value
  • 4 bits integer value
  • A 9 bit float value from 0 to 63.875, where LSB is 0.125
  • 4 bits integer value

EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.

And so on.

Do you know os any library which can easily achieve this?

I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.

EDIT: boost:dynamic_bitset looks promising. Any help using it?

3
  • You have 212 bits but you said that you need only 21 (4+4+9+4) bits. What meaning of other 191 bits? Commented Oct 22, 2012 at 7:57
  • you can have a look at: github.com/iso8859-1/BufferHandler. It's not complete yet but it should do what you want. Commented Oct 22, 2012 at 7:57
  • @DenisErmolin I said "And so on". If somebody helps me parsing those first values, I can parse the other 191 on my own. Anyway, if you want to know, basically 9 bit float and last 4 bit integer are repeated 14 more times Commented Oct 22, 2012 at 7:59

3 Answers 3

6

If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.

Don't forget to pack the structure.

Example:

#pragma pack

include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
   unsigned int preamble:8;
   unsigned int msgid:6;
   unsigned data:212;
   unsigned crc:24;
#else
   unsigned crc:24;
   unsigned data:212;
   unsigned int msgid:6;
   unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;

You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::

//order32.h

#ifndef ORDER32_H
#define ORDER32_H

#include <limits.h>
#include <stdint.h>

#if CHAR_BIT != 8
#error "unsupported char size"
#endif

enum
{
    O32_LITTLE_ENDIAN = 0x03020100ul,
    O32_BIG_ENDIAN = 0x00010203ul,
    O32_PDP_ENDIAN = 0x01000302ul
};

static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
    { { 0, 1, 2, 3 } };

#define O32_HOST_ORDER (o32_host_order.value)

#endif

Thanks to code by Christoph @ here

Example program for using bitfields and their outputs:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;

struct bitfields{
  unsigned opcode:5;
  unsigned info:3;
}__attribute__((packed));

struct bitfields opcodes;

/* info: 3bits; opcode: 5bits;*/
/* 001 10001  => 0x31*/
/* 010 10010  => 0x52*/

void set_data(unsigned char data)
{
  memcpy(&opcodes,&data,sizeof(data));
}

void print_data()
{
  cout << opcodes.opcode << ' ' << opcodes.info << endl;
}

int main(int argc, char *argv[])
{
  set_data(0x31);
  print_data(); //must print 17 1 on my little-endian machine
  set_data(0x52); 
  print_data(); //must print 18 2
  cout << sizeof(opcodes); //must print 1
  return 0;
}
Sign up to request clarification or add additional context in comments.

19 Comments

@RomanRdgz structures are aligned at word boundaries by default(which is 4 or 8 bytes). To avoid compilers to pad values(and therefore destroy the structure) we use #pragma pack(compiler dependent) (for it to work on gcc, you need to remove pragma pack and use __attribute__((packed))
I believe this approach breaks as soon as you try to compile your code on a platform with a different byte-order.
@PrototypeStark sorry but I don't understand the endian issue. When I was at college, I thought all the big/little endian stuff were about socket transmissions, so, If I already have a binary msg stored into a char buffer, I would think data is correctly ordered without having to check if system is big or little endian. Have I been wrong about that all this time, or just making this assumption because most architectures use little endian?
@RomanRdgz pretty much all archs these days are little endian. But never assume. simple test to determine endianness in C/C++: int isLittleEndian(){ char ch = (char)(0xFFEE); if(ch == 0xEE)return 1; else return 0; }
You're already avoiding the shifting and composing because you're setting the data straight into a buffer of length = sizeof of the structure. see the explanation and images explaining bitfields: href="msdn.microsoft.com/en-us/library/ewwyfdbe(v=vs.71).aspx"
|
1

You can manipulate bits for your own, for example to parse 4 bit integer value do:

char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0; 
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
    value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}

Floats usually sent as string data:

std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);

If your data contains custom float format then just extract bits and do your math:

char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0; 
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
    if (i > 8) {
      ++readPos;
      i = 0;
    }
    const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
    //here your code
    ++i;
    --bits_to_read;
}

5 Comments

4bit_int << wrong variable name. Variable names cannot start with a number.
you cannot create integers of < 8 bits(=1byte). Hence bitfields.
@DenisErmolin why using '_data' instead of just 'data'? Anyway, you are recommending static cast, but how do I get the 9 bit float? Because it is not coming as a string
Do you how float value was packed into byte?
@DenisErmolin LSB is 0.125, so if I have 0 0000 0001 it is 0.125. If 0 0000 0010 it is 0.250... if 1 000 0001 it would be (0.125)*2^8 + 0.125 = 32.125. At least that is what I understand when I read that LSB has that value
0

Here is a good article that describes several solutions to the problem.

It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.

Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.