1

I am looking for idea how to parse long binary data, so for example :"10100011111000111001" bits: 0-4 are the id bits 5-15 are the data etc etc...

the binary data structure can be change so I need to build a kind of data-base will store the data how to parse each string.

illustration (it could be 200~ bits) : enter image description here

Ideas how to implement it? Thanks

Edit

What am I missing here?

struct Bitfield {
uint16_t  a : 10 , b:6;};


void diag(){
uint16_t t= 61455;
struct Bitfield test = {t};

cout<<"a: "<<test.a<<endl;
cout<<"b: "<<test.b<<endl;

return;}

and the output is:

a: 15
b: 0
5
  • 3
    What's the best way to implement it? -- There is no "best way", as that is highly subjective. There are just "ways". Commented Jul 12, 2020 at 10:07
  • 1
    maybe std::bitset, though it lacks some convenience functions Commented Jul 12, 2020 at 10:27
  • You really called that a long binary sequence?!! Its just 16-bits. It can even fit in an int! Please tell us more about the constraints you want to put with that data, including the maximum length of the sequence. Commented Jul 12, 2020 at 10:32
  • Where does the data come from? It is stored in an array of bytes or as an iostream of bytes? How are the bits numbered? How do bits span from one byte to another? You must give a more precise specification of the problem. Commented Jul 12, 2020 at 18:45
  • its just a string of "1"s and "0"s, it isn't byte data just stream of bits Commented Jul 13, 2020 at 0:06

2 Answers 2

1

Options available

To manage a large structured set of bits, you have the following options:

  • C++ bit-fields: you define a structure with bitfield members. You can have as many members as you want, provided that each single one has no more bits than an unsigned long long.
    It's super easy to use; The compiler manages the access to bits or groups of bits for you. The major inconvenience is that the bit layout is implementation dependent. So this is not an option for writing portable code that exchanges data in a binary format.

  • Container of unsigned integral type: you define an array large enough to hold the all the bits, and access bits or groups of bits using a combination of logical operations. It requires to be at ease with binary operations and is not practical if groups of bits are split over consecutive elements. For exchanging data in binary format with the outside world in a protable way, you'd need to either take care of differences between big and little endian architectures or use arrays of uint8_t.

  • std::vector<bool>: gives you total flexibility to manage you bits. The main constraint is that you need to address each bit separately. Moreover, there's no data() member that could give direct access to the binary data .

  • std::bitset: is very similar to vector<bool> for accessing bits. It has a fixed size at compile time, but offers useful features such as reading and writing binary in ascci from strings or streams]5, converting from binary values of integral types, and logical operations on the full bitset.

  • A combination of these techniques

Make your choice

To communicate with the outside world in a portable way, the easiest approach is to use bitsets. Bitsets offer easy input/output/string conversion in a format using ascci '0' or '1' (or any substitutes thereof)

bitset<msg_header_size> bh,bh2;
bitset<msg_body_size> bb,bb2;
cin>>bh>>bb;  // reads a string od ascii 0 and 1 
cout<<bh<<"-"<<bb<<endl<<endl;  // writes a string of ascii 0 and 1

You can also convert from/to binary data (but a single element, large enough for the bitset size):

bitset<8> b(static_cast<uint8_t>(c));
cout<<b<<endl; 
cout<<b.to_ulong()<<endl;  

For reading/writing large sets, you'd need to read small bitsets and use logical operators to aggregate them in a larger bitset. It this seems time consuming, it's in fact very close to what you'd do in containers of integrals, but without having to care about byte boundaries.

In your case, with a fixed size header and a maximum size, the bitset seems to be a good choice (be careful however because the variable part is right justified) for exchanging binary data with the external world.

For working the data content, it's easy to access a specific bit, but you have to use some logical operations (shift, and) to access to groups of bits. Moreover, if you want readable and maintainable code, it's better to abstract the bit layout.

Conclusion:

I would therefore strongly advise to use internally a bit-field structure for working with the data and keep a comparable memory footprint than the original data and at the same time, use bitsets just to convert from/to this structure for the purpose of external data exchanges.

Sign up to request clarification or add additional context in comments.

8 Comments

First thank you all, my data stream could be contain ~200 bits, but each group of bits have different meaning. So I think my main question is how to store that DB of how to parse this long shift and how tor store the data after the parsing .
@Elior Maybe combine the approach and for every group m, instead of bitfields, use bitsets?
@Elior 1) Does the data come in binary format? 2) Do you have to export the data in binary format? 3) Is the size of the data fixed (200bits) or are there 34 bits that are fixed and a variable part?
@Elior ok. About 1: when you say “string”, do you mean printable ascii strings made of of ‘0’ and ‘1’s, or do you mean strings of binary bytes which result in garbage if printed without conversion ? About 3: is there a maximum length?
@Elior Ok, I think I get it. I've edited my answer to explain the choice, and I added a section about how to choose in view of your requirements. The conclusion presents you the most suitable approach to cover your needs.
|
1

The "best way" depends on the details of the problem.

If the whole number fits into the largest integer type available (usually long long), convert the string into an integer first (for example with stoi/stol/stoll functions, assuming C++11 is available). Then use bit-shifting combined with binary and (&) to extract the sections of the value you are interested in.

If the whole number does not fit into the largest integer type available, chop it up as a string (using the substr function) and then convert the substrings into integers one by one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.