Extracting continuos bits from a std::string bytewise with a bit offset

Question

I'm kind of at a loss i want to extract up to 64bits with a defined bitoffset and bitlength (unsigned long long) from a string (coming from network).

The string can be at an undefined length, so i need to be sure to only access it Bytewise. (Also means i cant use _bextr_u32 intrinsic). I cant use the std bitset class because it doesnt allow extraction of more then one bit with an offset and also only allows extraction of a predefined number of bits.

So I already calculate the byteoffset (within the string) and bitoffset (within the starting byte).

m_nByteOffset = nBitOffset / 8;
m_nBitOffset = nBitOffset % 8;

Now i can get the starting address

const char* sSource = str.c_str()+m_nByteOffset;

And the bitmask

unsigned long long nMask = 0xFFFFFFFFFFFFFFFFULL >> (64-nBitLen);

But now I just cant figure out how to extract up to 64 bits from this as there are no 128 bit integers available.

unsigned long long nResult = ((*(unsigned long long*)sSource) >> m_nBitOffset) & nMask;

This only works for up to 64-bitoffset bits, how can i extend it to really work for 64 bit indepently of the bitoffset. And also as this is not a bytewise access it could cause a memory read access violation.

So im really looking for a bytewise solution to this problem that works for up to 64 bits. (preferably C or intrinsics)

Update: After searching and testing a lot I will probably use this function from RakNet: https://github.com/OculusVR/RakNet/blob/master/Source/BitStream.cpp#L551

Smeeheey · Accepted Answer · 2016-06-14 12:30:04Z

2

To do it byte-wise, just read the string (which BTW it is better to interpret as a sequence of uint8_t rather than char) one byte at a time, updating your result by shifting it left 8 and oring it with the current byte. The only complications are the first bit and the last bit, which both require you to read a part of a byte. For the first part simply use a bit mask to get the bit you need, and for the last part down shift it by the amount needed. Here is the code:

const uint8_t* sSource = reinterpret_cast<const uint8_t*>(str.c_str()+m_nByteOffset);

uint64_t result = 0;
uint8_t FULL_MASK = 0xFF;

if(m_nBitOffset) {
    result = (*sSource & (FULL_MASK >> m_nBitOffset));
    nBitLen -= (8 - m_nBitOffset);
    sSource++;
}

while(nBitLen > 8) {
    result <<= 8;
    result |= *sSource;
    nBitLen -= 8;
    ++sSource;
}

if(nBitLen) {
    result <<= nBitLen;
    result |= (*sSource >> (8 - nBitLen));
}

return result;

answered Jun 14, 2016 at 12:30

Smeeheey

10.4k29 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

18 Comments

Hendrik Over a year ago

i like this division of the problem, this will work ill do some tests later today

Peter Cordes Over a year ago

A static_cast should work, since const char* should be compatible with const uint8_t*. That would give you some safety against typos.

Peter Cordes Over a year ago

This is might compile to some pretty bad asm with gcc. e.g. it might actually shift and load one byte at a time. :/ In asm, loading up-to-64bits that might not be byte-aligned should be doable with a byte-load, an unaligned 64bit load, and a couple shifts. (And an OR if you don't use a 2-register shift like x86's shrd to take a window of the concatenation of 2 regs). Branches are optional, to skip the byte-load if one 64bit load can include the entire desired bitstring. If you're lucky, though, you'll get asm like that from this src.

Hendrik Over a year ago

hm this code didn't work it shuffles the bytes wrong, endianness is destroyed. uint64_t nMask = 0xFFFFFFFFFFFFFFFFULL; uint64_t nPattern = 0xFFFEFDFCFBFAF9F8ULL; uint64_t nPattern2 = 0xFAAAAAAAAAAAAAFFULL; EXPECT_EQ( nPattern, ExtractField( 0, 64, (uint8_t*)&nPattern) ); EXPECT_EQ( nMask >> 1, ExtractField( 1, 63, (uint8_t*)&nMask) ); EXPECT_EQ( nPattern >> 1, ExtractField( 1, 63, (uint8_t*)&nPattern) ); EXPECT_EQ( nPattern2 >> 1, ExtractField( 1, 63, (uint8_t*)&nPattern2) );

Smeeheey Over a year ago

Well OK, but you didn't specify the endianess of your input (str.c_str()) and I assumed it was network byte order. I guess you're saying the data is little endian instead?

|

Felix Dombek · Accepted Answer · 2016-06-15 23:49:06Z

1

This is how I would do it in modern C++ style. The bit length is determined by the size of the buffer extractedBits: instead of using an unsigned long long, you could also use any other data type (or even array type) with the desired size.

See it live

unsigned long long extractedBits;
char* extractedString = reinterpret_cast<char*>(&extractedBits);
std::transform(str.begin() + m_nByteOffset,
               str.begin() + m_nByteOffset + sizeof(extractedBits),
               str.begin() + m_nByteOffset + 1,
               extractedString,
               [=](char c, char d)
               {
                   char bitsFromC = (c << m_nBitOffset);
                   char bitsFromD = 
                       (static_cast<unsigned char>(d) >> (CHAR_BIT - m_nBitOffset));
                   return bitsFromC | bitsFromD;
               });

edited Jun 15, 2016 at 23:49

answered Jun 14, 2016 at 12:29

Felix Dombek

14.6k19 gold badges86 silver badges148 bronze badges

4 Comments

Hendrik Over a year ago

hm i tried this code as well, but even basic test cases fail also there isn't a bitlength used.

Felix Dombek Over a year ago

@Hendrik Updated! Now it works and I explained how to change the bit length.

Peter Cordes Over a year ago

I thought the OP's bitlength didn't have to be a power of 2, or even a multiple of 8, so e.g. the result could be 45 bits, stored in the low 45 of a 64bit integer (zero-extended to fill the upper 19 bits with 0).

Hendrik Over a year ago

exactly, thats why its missing. anyways i found now the function i require in RakNets bitstream processor.

Collectives™ on Stack Overflow

Extracting continuos bits from a std::string bytewise with a bit offset

2 Answers 2

18 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

18 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related