How to compress a 32 bit array elements into minimum required bit elements?

Question

Lets assume I have an input array like below

int input_arr[100] = {10,20,1255,1200,50,55,1,5,6,1000};

Here to store each elements of array it took 32 bits even though value of array elements is very small i.e 1255 is the maximum elements in array and to store that I need only 11 bit that means in 11 bit I can fit all other elements of array.

So my task to compress 32-bit elements of array into 11-bit array elements ? Expected compressed array looks like

int output_arr[] = {00000001010 00000010100 .... 10011111111 ... }
                        |             |               |
                   11 bits(1)    11 bits(2)     11 bits( 1255)

To do the above task what I did is here

find the maximum elements in given array
find the bits required to store maximum elements(previous step)
find bytes required to store no of bits for e.g to store 11 bits I need equivalent 2 bytes(in below code new_size contains this). Here is I need your help. Here is the memory wastage as told by my manager because to store 11 bits my new_size is 2 bytes i.e 5 bits are still extra or wastage. How can I avoid this.

Here is what I tried

int my_pow(int input_num,int p) {
        int temp = 1;
        for(int iter = 0;iter < p; iter++) {
                temp = temp * input_num;
        }
        return temp;
}
int main() {
        #if 0
        int input_array[53069] = {1,2,2,3,4,1,2,4,6,1255,1,2,5,1233};
        #endif
        int input_array[] = {1,2,3,4,6,1255,1,2,5,1233};

        int max = input_array[0], ele = sizeof(input_array)/sizeof(input_array[0]);
        /* finding max elements in a array */
        for(int i = 0;i < ele; i++) {
                if(input_array[i] > max) {
                        max = input_array[i];
                }
        }
        /* finding no of bits required to store highest elements of array */
        int bit_required = 0;
        while(1) {
                if(max < my_pow(2,bit_required))
                        break;
                bit_required+=1;
        }
        /* when above loop fails bit_required is nothing 
           but no of bit required to store the highest element of array */

        /* finding size of new/compressed array */
        int new_size = 0;
        if(bit_required % 8 == 0) {
                new_size = bit_required/8;
        }
        else {
                new_size = (bit_required/8) + 1;
        }
        /* construct the new array again */
        typedef struct array_task {
                unsigned char new_array[new_size];/* in each cmp_arr, can store new_size char
                                                     now for each B[] I'm not using 32 bits , its new_size bits */
        }cmp_arr;/* creating new array of ele elements */
        cmp_arr cmpressed[ele];
        /* store elements of input_array[] into output_array[] */
        for(int row = 0 ; row < ele ;row++) {
                for(int col = bit_required - 1; col >= 0; col-- ) {
                        cmpressed[row].new_array[col] = ((input_array[row] >> col & 1) + 48) ;
                        printf("%d",(cmpressed[row].new_array[col]) - 48);
                }
                printf("\n");
        }
        #if 0
        printf("Size of A before %d\n",sizeof(input_array)); /* 40 bytes */
        printf("size of compressed array %d\n",sizeof(cmp_arr));/* same task, it perform in 2 bytes, 
                                                                each elements won't take 32 bits  */
        #endif
        return 0;
}

Is there any other way to do the same task efficiently ? All suggestion are most welcome ?

Marian · Accepted Answer · 2018-03-25 16:41:50Z

3

To put values shifted by 11 bits instead of 8, 16 or 32 will require manipulations with bits. You will basically have to emulate an array of bits in an array of (say 32 bits) integers. In this case if a value is stored at a bit offset X it will be (possibly) stored in your array somewhere on indexes X/32 and X/32+1 (if it is crossing border of 32 bits). Each time when you have to set a value into the array you have to load those two values and "place" your number there. The implementation is a bit technical, try the following code:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define MASK32 ((uint64_t)0xffffffff)

void setValue(uint32_t *a, int bits, int i, int n) {
    int bitoffset = i * bits;
    int index = bitoffset / 32;
    int shift = bitoffset % 32;
    uint64_t maskbits = (~(uint64_t)0) >> (64-bits);
    uint64_t val = ((uint64_t)a[index+1]<<32) + a[index];
    val = val & ~(maskbits << shift) | ((n & maskbits) << shift);
    a[index] = (val & MASK32);
    a[index+1] = (val >> 32) & MASK32;
}

int getValue(const uint32_t *a, int bits, int i) {
    int bitoffset = i * bits;
    int index = bitoffset / 32;
    int shift = bitoffset % 32;
    uint64_t maskbits = (~(uint64_t)0) >> (64-bits);
    int val = ((((uint64_t)a[index+1]<<32) + a[index]) >> shift) & maskbits;
    return(val);
}

int input_arr[100] = {10,20,1255,1200,50,55,1,5,6,1000};

int main() {
    int        i, j;
    uint32_t   a[100*11/32+2];

    for(i=0; i<100; i++) setValue(a,11,i,input_arr[i]);
    for(j=0; j<100; j++) printf("a[%d/11] == %d\n", j, getValue(a,11,j));
}

edited Mar 25, 2018 at 16:41

answered Mar 24, 2018 at 10:49

Marian

7,4922 gold badges24 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

chqrlie Over a year ago

Good illustration. You might want to use uint32_t for bitoffset, index and shift to ensure optimal compilation of / 32 and % 32. Also mask n in setValue as val = val & ~(maskbits << shift) | ((uint64_t)(n & mask) << shift); to make it resistant to invalid values.

chqrlie Over a year ago

It is too bad you need the array to have one extra padding element at the end, but tricky to avoid it without tests in getValue and setValue. The computation should use 64 bits to avoid potential arithmetic overflow. You might want to hide this implementation detail with a macro: #define BITFIELD_ARRAY_SIZE(n, b) ((size_t)((uint64_t)(n)*(b)/32+2))

Marian Over a year ago

@chqrlie Thnks. Added const and masking of value. However I do not see why uint32_t bitoffset, ... makes compilation more optimal compared to int bitoffset, ....

chqrlie Over a year ago

Unsigned division and modulo by a power of 2 generate a simple right shift and mask. Signed division is not as simple as it must handle negative numbers and rounds toward 0. Look at the generated assembly code and see the extra instructions to adjust for potentially negative numbers.

JohnTortugo · Accepted Answer · 2018-03-24 07:42:34Z

1

Another approach that I find "interesting" is allocating an array of chars and doing a cast to an type that fits the maximum value. Something like this:

NumBytesMaxValue = ...;
void* pointers = malloc(NumBytesMaxValue * NumValues);

if (NumBytesMaxValue == 1)
  cast_pointer_to_char_and_fill_it();
else if (NumBytesMaxValue == 2)
  cast_pointer_to_short_and_fill_it();
...

answered Mar 24, 2018 at 7:42

JohnTortugo

6,6657 gold badges42 silver badges76 bronze badges

3 Comments

Achal Over a year ago

Thanks. In cast_pointer_to_char_and_fill_it() I can fill the values but what how many bits ? I think there may still some unused bits

Achal Over a year ago

why can't there is variable bit filed in C ? if its possible I can do like struct data { char arr : num_of_bit ; }

JohnTortugo Over a year ago

I don't fully grasp what you want. But if you really want full fledged compression you can always look for a library that implements it. For instance zlib.

chqrlie · Accepted Answer · 2018-03-24 08:37:40Z

Data compression is a vast subject, an active area of research... compressing your data can be done in so many different ways as to make it off topic.

Finding the smallest type for the array can however be done by a utility program or a preliminary phase:

#include <limits.h>
#include <stdio.h>

int main() {
    int input_array[] = { 1, 2, 2, 3, 4, 1, 2, 4, 6, 1255, 1, 2, 5, 1233 };
    size_t i, count = sizeof(input_array) / sizeof(input_array[0]);
    int min, max;
    int nc = 0;
    min = max = input_array[0];
    for (i = 1; i < count; i++) {
        if (min > input_array[i]) min = intput_array[i];
        if (max < input_array[i]) max = intput_array[i];
    }
    printf("min value is %d, max value is %d\n", min, max);
    if (min >= SCHAR_MIN && max <= SCHAR_MAX)
        nc += printf("type signed char is appropriate\n");
    if (min >= 0 && max <= UCHAR_MAX)
        nc += printf("type unsigned char is appropriate\n");
    if (min >= SHRT_MIN && max <= SHRT_MAX)
        nc += printf("type short is appropriate\n");
    if (min >= 0 && max <= USHRT_MAX)
        nc += printf("type unsigned short is appropriate\n");
    if (nc == 0)
        printf("no type smaller than int is appropriate\n");
    return 0;
}

You can use the same approach for a set of numbers with values unknown at compile time with these steps:

start with an allocated array of a small type such as signed char.
read the next value: if it fits in the current type, add it to the array and continue.
if not, allocate an array of a larger type such as short, copy the values parsed so far into it, free the previous array, store the new value and continue.
if the new value does not fit in a short, use a larger type such as int.
you could write code for even larger types such as long and long long, but you need specific code for each type.
at the end of the read phase, you have an array of the smallest type that handles all the values in the dataset. Handle this array with code for its specific type. This means you have to duplicate the processing code for each type, which can be tricky.

Thanks. you given some hints which is more than enough for me. I'll do it.

Collectives™ on Stack Overflow

How to compress a 32 bit array elements into minimum required bit elements?

3 Answers 3

4 Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related