Extracting Values Across Byte Boundaries With Arbitrary Bit Positions and Lengths In C#

Question

I am currently working on a network tool that needs to decode/encode a particular protocol that packs fields into dense bit arrays at arbitrary positions. For example, one part of the protocol uses 3 bytes to represent a number of different fields:

Bit Position(s)  Length (In Bits)    Type
0                1                   bool
1-5              5                   int
6-13             8                   int
14-22            9                   uint
23               1                   bool

As you can see, several of the fields span multiple bytes. Many (most) are also shorter than the built-in type that might be used to represent them, such as the first int field which is only 5 bits long. In these cases, the most significant bits of the target type (such as an Int32 or Int16) should be padded with 0 to make up the difference.

My problem is that I am having a difficult time processing this kind of data. Specifically, I am having a hard time figuring out how to efficiently get arbitrary length bit arrays, populate them with the appropriate bits from the source buffer, pad them to match the target type, and convert the padded bit arrays to the target type. In an ideal world, I would be able to take the byte[3] in the example above and call a method like GetInt32(byte[] bytes, int startBit, int length).

The closest thing in the wild that I've found is a BitStream class, but it appears to want individual values to line up on byte/word boundaries (and the half-streaming/half-indexed access convention of the class makes it a little confusing).

My own first attempt was to use the BitArray class, but that proved somewhat unwieldy. It's easy enough to stuff all the bits from the buffer into a large BitArray, transfer only the ones you want from the source BitArray to a new temporary BitArray, and then convert that into the target value...but it seems wrong, and very time consuming.

I am now considering a class like the following that references (or creates) a source/target byte[] buffer along with an offset and provides get and set methods for certain target types. The tricky part is that getting/setting values may span multiple bytes.

class BitField
{
    private readonly byte[] _bytes;
    private readonly int _offset;

    public BitField(byte[] bytes)
        : this(bytes, 0)
    {
    }

    public BitField(byte[] bytes, int offset)
    {
        _bytes = bytes;
        _offset = offset;
    }

    public BitField(int size)
        : this(new byte[size], 0)
    {
    }

    public bool this[int bit]
    {
        get { return IsSet(bit); }
        set { if (value) Set(bit); else Clear(bit); }
    }

    public bool IsSet(int bit)
    {
        return (_bytes[_offset + (bit / 8)] & (1 << (bit % 8))) != 0;
    }

    public void Set(int bit)
    {
        _bytes[_offset + (bit / 8)] |= unchecked((byte)(1 << (bit % 8)));
    }

    public void Clear(int bit)
    {
        _bytes[_offset + (bit / 8)] &= unchecked((byte)~(1 << (bit % 8)));
    }

    //startIndex = the index of the bit at which to start fetching the value
    //length = the number of bits to include - may be less than 32 in which case
    //the most significant bits of the target type should be padded with 0
    public int GetInt32(int startIndex, int length)
    {
        //NEED CODE HERE
    }

    //startIndex = the index of the bit at which to start storing the value
    //length = the number of bits to use, if less than the number of bits required
    //for the source type, precision may be lost
    //value = the value to store
    public void SetValue(int startIndex, int length, int value)
    {
        //NEED CODE HERE
    }

    //Other Get.../Set... methods go here
}

I am looking for any guidance in this area such as third-party libraries, algorithms for getting/setting values at arbitrary bit positions that span multiple bytes, feedback on my approach, etc. I included the class above for clarification and am not necessarily looking for code to fill it in (though I won't argue if someone wants to work it out!).

A few folks have mentioned endianness, and that's certainly a concern. Because I'm dealing with network data, I'm assuming the original buffer is in big-endian (or network-order). I was planning on using the excellent EndianBitConverter from MiscUtils to convert everything to local order inside the Get... methods. — daveaglick
– daveaglick, Commented Jul 11, 2011 at 18:29
I am still working this problem. None of the answers below were quite right for this situation (particularly the aspects involving arbitrary positions and spanning byte boundaries). I have decided to fully implement the BitField class presented in the question, and have been having some luck. Because it may have a lot of uses, especially for network processing where bit fields are often densely packed, I will post the completed class in the next couple days once it's done as an additional answer. I will continue to upvote other helpful answers that address the question. — daveaglick
– daveaglick, Commented Jul 13, 2011 at 14:38

daveaglick · Accepted Answer · 2011-07-20 23:07:59Z

As promised, here is the class I ended up creating for this purpose. It will wrap an arbitrary byte array at an optionally specified index and allowing reading/writing at the bit level. It provides methods for reading/writing arbitrary blocks of bits from other byte arrays or for reading/writing primitive values with user-defined offsets and lengths. It works very well for my situation and solves the exact question I asked above. However, it does have a couple shortcomings. The first is that it is obviously not greatly documented - I just haven't had the time. The second is that there are no bounds or other checks. It also currently requires the MiscUtil library to provide endian conversion. All that said, hopefully this can help solve or serve as a starting point for someone else with a similar use case.

internal class BitField
{
    private readonly byte[] _bytes;
    private readonly int _offset;
    private EndianBitConverter _bitConverter = EndianBitConverter.Big;

    public BitField(byte[] bytes)
        : this(bytes, 0)
    {
    }

    //offset = the offset (in bytes) into the wrapped byte array
    public BitField(byte[] bytes, int offset)
    {
        _bytes = bytes;
        _offset = offset;
    }

    public BitField(int size)
        : this(new byte[size], 0)
    {
    }

    //fill == true = initially set all bits to 1
    public BitField(int size, bool fill)
        : this(new byte[size], 0)
    {
        if (!fill) return;
        for(int i = 0 ; i < size ; i++)
        {
            _bytes[i] = 0xff;
        }
    }

    public byte[] Bytes
    {
        get { return _bytes; }
    }

    public int Offset
    {
        get { return _offset; }
    }

    public EndianBitConverter BitConverter
    {
        get { return _bitConverter; }
        set { _bitConverter = value; }
    }

    public bool this[int bit]
    {
        get { return IsBitSet(bit); }
        set { if (value) SetBit(bit); else ClearBit(bit); }
    }

    public bool IsBitSet(int bit)
    {
        return (_bytes[_offset + (bit / 8)] & (1 << (7 - (bit % 8)))) != 0;
    }

    public void SetBit(int bit)
    {
        _bytes[_offset + (bit / 8)] |= unchecked((byte)(1 << (7 - (bit % 8))));
    }

    public void ClearBit(int bit)
    {
        _bytes[_offset + (bit / 8)] &= unchecked((byte)~(1 << (7 - (bit % 8))));
    }

    //index = the index of the source BitField at which to start getting bits
    //length = the number of bits to get
    //size = the total number of bytes required (0 for arbitrary length return array)
    //fill == true = set all padding bits to 1
    public byte[] GetBytes(int index, int length, int size, bool fill)
    {
        if(size == 0) size = (length + 7) / 8;
        BitField bitField = new BitField(size, fill);
        for(int s = index, d = (size * 8) - length ; s < index + length && d < (size * 8) ; s++, d++)
        {
            bitField[d] = IsBitSet(s);
        }
        return bitField._bytes;
    }

    public byte[] GetBytes(int index, int length, int size)
    {
        return GetBytes(index, length, size, false);
    }

    public byte[] GetBytes(int index, int length)
    {
        return GetBytes(index, length, 0, false);
    }

    //bytesIndex = the index (in bits) into the bytes array at which to start copying
    //index = the index (in bits) in this BitField at which to put the value
    //length = the number of bits to copy from the bytes array
    public void SetBytes(byte[] bytes, int bytesIndex, int index, int length)
    {
        BitField bitField = new BitField(bytes);
        for (int i = 0; i < length; i++)
        {
            this[index + i] = bitField[bytesIndex + i];
        }
    }

    public void SetBytes(byte[] bytes, int index, int length)
    {
        SetBytes(bytes, 0, index, length);
    }

    public void SetBytes(byte[] bytes, int index)
    {
        SetBytes(bytes, 0, index, bytes.Length * 8);
    }

    //UInt16

    //index = the index (in bits) at which to start getting the value
    //length = the number of bits to use for the value, if less than required the value is padded with 0
    public ushort GetUInt16(int index, int length)
    {
        return _bitConverter.ToUInt16(GetBytes(index, length, 2), 0);
    }

    public ushort GetUInt16(int index)
    {
        return GetUInt16(index, 16);
    }

    //valueIndex = the index (in bits) of the value at which to start copying
    //index = the index (in bits) in this BitField at which to put the value
    //length = the number of bits to copy from the value
    public void Set(ushort value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(ushort value, int index)
    {
        Set(value, 0, index, 16);
    }

    //UInt32

    public uint GetUInt32(int index, int length)
    {
        return _bitConverter.ToUInt32(GetBytes(index, length, 4), 0);
    }

    public uint GetUInt32(int index)
    {
        return GetUInt32(index, 32);
    }

    public void Set(uint value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(uint value, int index)
    {
        Set(value, 0, index, 32);
    }

    //UInt64

    public ulong GetUInt64(int index, int length)
    {
        return _bitConverter.ToUInt64(GetBytes(index, length, 8), 0);
    }

    public ulong GetUInt64(int index)
    {
        return GetUInt64(index, 64);
    }

    public void Set(ulong value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(ulong value, int index)
    {
        Set(value, 0, index, 64);
    }

    //Int16

    public short GetInt16(int index, int length)
    {
        return _bitConverter.ToInt16(GetBytes(index, length, 2, IsBitSet(index)), 0);
    }

    public short GetInt16(int index)
    {
        return GetInt16(index, 16);
    }

    public void Set(short value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(short value, int index)
    {
        Set(value, 0, index, 16);
    }

    //Int32

    public int GetInt32(int index, int length)
    {
        return _bitConverter.ToInt32(GetBytes(index, length, 4, IsBitSet(index)), 0);
    }

    public int GetInt32(int index)
    {
        return GetInt32(index, 32);
    }

    public void Set(int value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(int value, int index)
    {
        Set(value, 0, index, 32);
    }

    //Int64

    public long GetInt64(int index, int length)
    {
        return _bitConverter.ToInt64(GetBytes(index, length, 8, IsBitSet(index)), 0);
    }

    public long GetInt64(int index)
    {
        return GetInt64(index, 64);
    }

    public void Set(long value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(long value, int index)
    {
        Set(value, 0, index, 64);
    }

    //Char

    public char GetChar(int index, int length)
    {
        return _bitConverter.ToChar(GetBytes(index, length, 2), 0);
    }

    public char GetChar(int index)
    {
        return GetChar(index, 16);
    }

    public void Set(char value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(char value, int index)
    {
        Set(value, 0, index, 16);
    }

    //Bool

    public bool GetBool(int index, int length)
    {
        return _bitConverter.ToBoolean(GetBytes(index, length, 1), 0);
    }

    public bool GetBool(int index)
    {
        return GetBool(index, 8);
    }

    public void Set(bool value, int valueIndex, int index, int length)
    {
        SetBytes(_bitConverter.GetBytes(value), valueIndex, index, length);
    }

    public void Set(bool value, int index)
    {
        Set(value, 0, index, 8);
    }

    //Single and double precision floating point values must always use the correct number of bits
    public float GetSingle(int index)
    {
        return _bitConverter.ToSingle(GetBytes(index, 32, 4), 0);
    }

    public void SetSingle(float value, int index)
    {
        SetBytes(_bitConverter.GetBytes(value), 0, index, 32);
    }

    public double GetDouble(int index)
    {
        return _bitConverter.ToDouble(GetBytes(index, 64, 8), 0);
    }

    public void SetDouble(double value, int index)
    {
        SetBytes(_bitConverter.GetBytes(value), 0, index, 64);
    }
}

Could you post this as an edit to your original question? Makes it easier to find. Also thank you!
@Benjamin answers should not be posted as edits to questions. If you want something to be easy to find, you can bookmark it..

Henk Holterman · Accepted Answer · 2011-07-11 18:17:07Z

1

If your packets are always smaller than 8 or 4 bytes it would be easier to store each packet in an Int32 or Int64. The byte array only complicates things. You do have to pay attention to High-Endian vs Low-Endian storage.

And then, for a 3 byte package:

public static void SetValue(Int32 message, int startIndex, int length, int value)
{
   // we want lengthx1
   int mask = (1 << length) - 1;     
   value = value & mask;  // or check and throw

   int offset = 24 - startIndex - length;   // 24 = 3 * 8
   message = message | (value << offset);
}

answered Jul 11, 2011 at 18:17

Henk Holterman

276k33 gold badges353 silver badges540 bronze badges

3 Comments

daveaglick Over a year ago

Unfortunately the length of a packet varies widely and is usually a lot bigger than a single Int64 could hold. I've looked at the new BigInteger structure and wondered if it might work with the same concepts you present, but I'm stuck on 3.5 for now so I didn't go too far down that path. Perhaps I need to take a another look - would these algorithms work on a BigInteger (it does support bitwise operators)?

Henk Holterman Over a year ago

Well, BigInteger seems to have shift operators, it might save you some work. But don't expect blazing speed.

Rowan Smith Over a year ago

Just found this and tested it in .Net5, this doesn't work because LeftShifting a BigInteger results in the BigInteger growing in capacity, it doesn't truncate on the original boundary. RightShift works as expected. Also leading zero bytes are truncated, so bit 8 becomes bit 0, if byte[0] = 0.

Tejs · Accepted Answer · 2011-07-11 18:10:18Z

First up, it seems you have re invented the wheel with the System.Collections.BitArray class. As for actually finding the value of a specific field of bits, I think that can easily be accomplished with a little math magic of the following pseudocode:

Start at the most distant digit in your selection (startIndex + length).
If it is set, add 2^(distance from digit). In this case, it would be 0 (mostDistance - self = 0). So Add 2^0 (1).
Move one bit to the left.
Repeat for each digit in the length you want.

In that situation, if you had a bit array like so:

10001010

And you want the value of digits 0-3, you would get something like:

[Index 3]   [Index 2]   [Index 1]   [Index 0]
(3 - 3)     (3 - 2)     (3 - 1)     (3 - 0)
=============================================
(0 * 2^0) + (0 * 2^1) + (0 * 2^2) + (1 * 2^3) = 8

Since 1000 (binary) == 8, the math works out.

Brandon Moretz · Accepted Answer · 2011-07-11 18:11:03Z

0

What's the problem with just using simple bit shifts to get your values out?

int data = Convert.ToInt32( "110000000010000000100001", 2 );

bool v1 = ( data & 1 ) == 1; // True
int v2 = ( data >> 1 ) & 0x1F; // 16
int v3 = ( data >> 6 ) & 0xFF; // 128
uint v4 = (uint )( data >> 14 ) & 0x1FF; // 256
bool v5 = ( data >> 23 ) == 1; // True

This is a pretty good article covering the subject. it's in C, but the same concepts still apply.

answered Jul 11, 2011 at 18:11

Brandon Moretz

7,6713 gold badges35 silver badges43 bronze badges

1 Comment

daveaglick Over a year ago

For my specific problem, there are a lot of these fields and I'm not sure I want to code accessors and setters for each one by hand. In fact, in some cases I may not know the patterns exactly until other portions are decoded. I think a general solution to the problem would be useful in many cases. I'm not opposed to using bit manipulation, masks, shifting, etc. - I'd just like algorithms that let me do it using arbitrary starting points and lengths.

Collectives™ on Stack Overflow

Extracting Values Across Byte Boundaries With Arbitrary Bit Positions and Lengths In C#

4 Answers 4

2 Comments

3 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related