how to make a bit-set/byte-array conversion in c

Question

Given an array, unsigned char q[32]="1100111...",

how can I generate a 4-bytes bit-set, unsigned char p[4], such that, the bit of this bit-set, equals to value inside the array, e.g., the first byte p[0]= "q[0] ... q[7]"; 2nd byte p[1]="q[8] ... q[15]", etc.

and also how to do it in opposite, i.e., given bit-set, generate the array?

my own trial out for the first part.

unsigned char p[4]={0};
for (int j=0; j<N; j++) 
{
    if (q[j] == '1')
    {
        p [j / 8] |= 1 << (7-(j % 8)); 
    }            
}

Is the above right? any conditions to check? Is there any better way?

EDIT - 1

I wonder if above is efficient way? As the array size could be upto 4096 or even more.

@pepero as Zeta Two said, it should be (q[j] == '1'), not (q[j] == 1). You are comparing q[j] to the character '1', not to the number 1. — Omri Barel
– Omri Barel, Commented Oct 11, 2011 at 18:37

Community · Accepted Answer · 2020-06-20 09:12:55Z

First, Use strtoul to get a 32-bit value. Then convert the byte order to big-endian with htonl. Finally, store the result in your array:

#include <arpa/inet.h>
#include <stdlib.h>

/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

There are other ways as well.

But I lack `<arpa/inet.h>`!

Then you need to know what byte order your platform is. If it's big endian, then htonl does nothing and can be omitted. If it's little-endian, then htonl is just:

unsigned long htonl(unsigned long x)
{
    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

If you're lucky, your optimizer might see what you're doing and make it into efficient code. If not, well, at least it's all implementable in registers and O(log N).

If you don't know what byte order your platform is, then you need to detect it:

typedef union {
    char c[sizeof(int) / sizeof(char)];
    int i;
} OrderTest;

unsigned long htonl(unsigned long x)
{
    OrderTest test;
    test.i = 1;
    if(!test.c[0])
        return x;

    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

Maybe `long` is 8 bytes!

Well, the OP implied 4-byte inputs with their array size, but 8-byte long is doable:

#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
    x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
    x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
    x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
    x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
    x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
    return x;
}

For char that isn't 8 bits (DSPs like to do this), you're on your own. (This is why it was a Big Deal when the SHARC series of DSPs had 8-bit bytes; it made it a LOT easier to port existing code because, face it, C does a horrible job of portability support.)

What about arbitrary length buffers? No funny pointer typecasts, please.

The main thing that can be improved with the OP's version is to rethink the loop's internals. Instead of thinking of the output bytes as a fixed data register, think of it as a shift register, where each successive bit is shifted into the right (LSB) end. This will save you from all those divisions and mods (which, hopefully, are optimized away to bit shifts).

For sanity, I'm ditching unsigned char for uint8_t.

#include <stdint.h>

unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
    size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
 * bytes in `outBytes`. `numBytes` is the number of available bytes in the
 * `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
 * to is set to the number of bytes read (rounding up to the nearest full
 * byte). If a multiple of 8 bits is not read, the last byte written will be
 * padded with 0 bits to reach a multiple of 8 bits. This function returns the
 * number of padding bits that were added. For example, an input of 11 bits
 * will result `bytesRead` being set to 2 and the function will return 5. This
 * means that if a nonzero value is returned, then a partial byte was read,
 * which may be an error.
 */
{   size_t bytes = 0;
    unsigned bits = 0;
    uint8_t x = 0;

    while(bytes < numBytes)
    {   /* Parse a character. */
        switch(*inChars++)
        {   '0': x <<= 1; ++bits; break;
            '1': x = (x << 1) | 1; ++bits; break;
            default: numBytes = 0;
        }

        /* See if we filled a byte. */
        if(bits == 8)
        {   outBytes[bytes++] = x;
            x = 0;
            bits = 0;
        }
    }

    /* Padding, if needed. */
    if(bits)
    {   bits = 8 - bits;
        outBytes[bytes++] = x << bits;
    }

    /* Finish up. */
    if(bytesRead)
        *bytesRead = bytes;
    return bits;
}

It's your responsibility to make sure inChars is null-terminated. The function will return on the first non-'0' or '1' character it sees or if it runs out of output buffer. Some example usage:

unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
    /* Partial read; handle error here. */
}

This just reads 4 bytes, and traps the error if it can't.

unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);

This just converts what it can and sets the rest to 0 bits.

This function could be done better if C had the ability to break out of more than one level of loop or switch; as it stands, I'd have to add a flag value to get the same effect, which is clutter, or I'd have to add a goto, which I simply refuse.

That would work, unless you don't have <arpa/inet.h> on your system, or your processor is a bit sensitive to alignment, or your long is 8 bytes long. There are more portable solutions (including the one in the question).
@Omri: I'm not sure what you mean by "sensitive to alignment". Are you saying that result might not be aligned to a 4-byte bound? The only architectures I know of that misalign their stack (where auto variables are put) are misalignment-tolerant, and this isn't going to be fast, optimized code anyway due to the parsing.
@Mike: The point is that your code causes undefined behaviour (conversion of incompatible pointers). If you don't care about speed, why use tricks instead of straight-forward readable code?
@Mike, I accept Chriszuma's answer, but I really like what you show here. I think that is more valuable than my question.

Chriszuma · Accepted Answer · 2011-10-11 18:45:15Z

2

I don't think that will quite work. You are comparing each "bit" to 1 when it should really be '1'. You can also make it a bit more efficient by getting rid of the if:

unsigned char p[4]={0};
for (int j=0; j<32; j++) 
{
    p [j / 8] |= (q[j] == `1`) << (7-(j % 8));           
}

Going in reverse is pretty simple too. Just mask for each "bit" that you set earlier.

unsigned char q[32]={0};
for (int j=0; j<32; j++) {
  q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}

You'll notice the creative use of (boolean) + '0' to convert between 1/0 and '1'/'0'.

edited Oct 11, 2011 at 18:45

answered Oct 11, 2011 at 18:39

Chriszuma

4,60824 silver badges19 bronze badges

Comments

Joe · Accepted Answer · 2011-10-11 18:58:11Z

1

According to your example it does not look like you are going for readability, and after a (late) refresh my solution looks very similar to Chriszuma except for the lack of parenthesis due to order of operations and the addition of the !! to enforce a 0 or 1.

const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed

for(size_t i = 0; i < N; ++i)
    p[i / 8] |= (q[i] == '1') << 7 - i % 8;

for(size_t i = 0; i < N; ++i)
    r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);

printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);

answered Oct 11, 2011 at 18:58

Joe

57.2k9 gold badges130 silver badges136 bronze badges

Comments

anatolyg · Accepted Answer · 2011-10-11 19:12:33Z

1

If you are looking for extreme efficiency, try to use the following techniques:

Replace if by subtraction of '0' (seems like you can assume your input symbols can be only 0 or 1). Also process the input from lower indices to higher ones.

for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + q[c + b] - '0';
    p[c / 8] = y;
}

Replace array indices by auto-incrementing pointers:

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + *qptr++ - '0';
    *pptr++ = y;
}

Unroll the inner loop:

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    *pptr++ =
        qptr[0] - '0' << 7 |
        qptr[1] - '0' << 6 |
        qptr[2] - '0' << 5 |
        qptr[3] - '0' << 4 |
        qptr[4] - '0' << 3 |
        qptr[5] - '0' << 2 |
        qptr[6] - '0' << 1 |
        qptr[7] - '0' << 0;
    qptr += 8;
}

Process several input characters simultaneously (using bit twiddling hacks or MMX instructions) - this has great speedup potential!

answered Oct 11, 2011 at 19:12

anatolyg

28.5k9 gold badges66 silver badges149 bronze badges

1 Comment

Omri Barel Over a year ago

And one more suggestion, instead of subtracting '0' from each character, you can calculate what you need to subtract for 8 characters and do it in one go (for ASCII that would be 12240).

Collectives™ on Stack Overflow

how to make a bit-set/byte-array conversion in c

4 Answers 4

But I lack `<arpa/inet.h>`!

Maybe `long` is 8 bytes!

What about arbitrary length buffers? No funny pointer typecasts, please.

4 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

But I lack <arpa/inet.h>!

Maybe long is 8 bytes!

What about arbitrary length buffers? No funny pointer typecasts, please.

4 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related

But I lack `<arpa/inet.h>`!

Maybe `long` is 8 bytes!