2

Lets assume I have an input array like below

int input_arr[100] = {10,20,1255,1200,50,55,1,5,6,1000};

Here to store each elements of array it took 32 bits even though value of array elements is very small i.e 1255 is the maximum elements in array and to store that I need only 11 bit that means in 11 bit I can fit all other elements of array.

So my task to compress 32-bit elements of array into 11-bit array elements ? Expected compressed array looks like

int output_arr[] = {00000001010 00000010100 .... 10011111111 ... }
                        |             |               |
                   11 bits(1)    11 bits(2)     11 bits( 1255)

To do the above task what I did is here

  • find the maximum elements in given array
  • find the bits required to store maximum elements(previous step)
  • find bytes required to store no of bits for e.g to store 11 bits I need equivalent 2 bytes(in below code new_size contains this). Here is I need your help. Here is the memory wastage as told by my manager because to store 11 bits my new_size is 2 bytes i.e 5 bits are still extra or wastage. How can I avoid this.

Here is what I tried

int my_pow(int input_num,int p) {
        int temp = 1;
        for(int iter = 0;iter < p; iter++) {
                temp = temp * input_num;
        }
        return temp;
}
int main() {
        #if 0
        int input_array[53069] = {1,2,2,3,4,1,2,4,6,1255,1,2,5,1233};
        #endif
        int input_array[] = {1,2,3,4,6,1255,1,2,5,1233};

        int max = input_array[0], ele = sizeof(input_array)/sizeof(input_array[0]);
        /* finding max elements in a array */
        for(int i = 0;i < ele; i++) {
                if(input_array[i] > max) {
                        max = input_array[i];
                }
        }
        /* finding no of bits required to store highest elements of array */
        int bit_required = 0;
        while(1) {
                if(max < my_pow(2,bit_required))
                        break;
                bit_required+=1;
        }
        /* when above loop fails bit_required is nothing 
           but no of bit required to store the highest element of array */

        /* finding size of new/compressed array */
        int new_size = 0;
        if(bit_required % 8 == 0) {
                new_size = bit_required/8;
        }
        else {
                new_size = (bit_required/8) + 1;
        }
        /* construct the new array again */
        typedef struct array_task {
                unsigned char new_array[new_size];/* in each cmp_arr, can store new_size char
                                                     now for each B[] I'm not using 32 bits , its new_size bits */
        }cmp_arr;/* creating new array of ele elements */
        cmp_arr cmpressed[ele];
        /* store elements of input_array[] into output_array[] */
        for(int row = 0 ; row < ele ;row++) {
                for(int col = bit_required - 1; col >= 0; col-- ) {
                        cmpressed[row].new_array[col] = ((input_array[row] >> col & 1) + 48) ;
                        printf("%d",(cmpressed[row].new_array[col]) - 48);
                }
                printf("\n");
        }
        #if 0
        printf("Size of A before %d\n",sizeof(input_array)); /* 40 bytes */
        printf("size of compressed array %d\n",sizeof(cmp_arr));/* same task, it perform in 2 bytes, 
                                                                each elements won't take 32 bits  */
        #endif
        return 0;
}

Is there any other way to do the same task efficiently ? All suggestion are most welcome ?

3 Answers 3

3

To put values shifted by 11 bits instead of 8, 16 or 32 will require manipulations with bits. You will basically have to emulate an array of bits in an array of (say 32 bits) integers. In this case if a value is stored at a bit offset X it will be (possibly) stored in your array somewhere on indexes X/32 and X/32+1 (if it is crossing border of 32 bits). Each time when you have to set a value into the array you have to load those two values and "place" your number there. The implementation is a bit technical, try the following code:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define MASK32 ((uint64_t)0xffffffff)

void setValue(uint32_t *a, int bits, int i, int n) {
    int bitoffset = i * bits;
    int index = bitoffset / 32;
    int shift = bitoffset % 32;
    uint64_t maskbits = (~(uint64_t)0) >> (64-bits);
    uint64_t val = ((uint64_t)a[index+1]<<32) + a[index];
    val = val & ~(maskbits << shift) | ((n & maskbits) << shift);
    a[index] = (val & MASK32);
    a[index+1] = (val >> 32) & MASK32;
}

int getValue(const uint32_t *a, int bits, int i) {
    int bitoffset = i * bits;
    int index = bitoffset / 32;
    int shift = bitoffset % 32;
    uint64_t maskbits = (~(uint64_t)0) >> (64-bits);
    int val = ((((uint64_t)a[index+1]<<32) + a[index]) >> shift) & maskbits;
    return(val);
}

int input_arr[100] = {10,20,1255,1200,50,55,1,5,6,1000};

int main() {
    int        i, j;
    uint32_t   a[100*11/32+2];

    for(i=0; i<100; i++) setValue(a,11,i,input_arr[i]);
    for(j=0; j<100; j++) printf("a[%d/11] == %d\n", j, getValue(a,11,j));
}
Sign up to request clarification or add additional context in comments.

4 Comments

Good illustration. You might want to use uint32_t for bitoffset, index and shift to ensure optimal compilation of / 32 and % 32. Also mask n in setValue as val = val & ~(maskbits << shift) | ((uint64_t)(n & mask) << shift); to make it resistant to invalid values.
It is too bad you need the array to have one extra padding element at the end, but tricky to avoid it without tests in getValue and setValue. The computation should use 64 bits to avoid potential arithmetic overflow. You might want to hide this implementation detail with a macro: #define BITFIELD_ARRAY_SIZE(n, b) ((size_t)((uint64_t)(n)*(b)/32+2))
@chqrlie Thnks. Added const and masking of value. However I do not see why uint32_t bitoffset, ... makes compilation more optimal compared to int bitoffset, ....
Unsigned division and modulo by a power of 2 generate a simple right shift and mask. Signed division is not as simple as it must handle negative numbers and rounds toward 0. Look at the generated assembly code and see the extra instructions to adjust for potentially negative numbers.
1

Another approach that I find "interesting" is allocating an array of chars and doing a cast to an type that fits the maximum value. Something like this:

NumBytesMaxValue = ...;
void* pointers = malloc(NumBytesMaxValue * NumValues);

if (NumBytesMaxValue == 1)
  cast_pointer_to_char_and_fill_it();
else if (NumBytesMaxValue == 2)
  cast_pointer_to_short_and_fill_it();
...

3 Comments

Thanks. In cast_pointer_to_char_and_fill_it() I can fill the values but what how many bits ? I think there may still some unused bits
why can't there is variable bit filed in C ? if its possible I can do like struct data { char arr : num_of_bit ; }
I don't fully grasp what you want. But if you really want full fledged compression you can always look for a library that implements it. For instance zlib.
1

Data compression is a vast subject, an active area of research... compressing your data can be done in so many different ways as to make it off topic.

Finding the smallest type for the array can however be done by a utility program or a preliminary phase:

#include <limits.h>
#include <stdio.h>

int main() {
    int input_array[] = { 1, 2, 2, 3, 4, 1, 2, 4, 6, 1255, 1, 2, 5, 1233 };
    size_t i, count = sizeof(input_array) / sizeof(input_array[0]);
    int min, max;
    int nc = 0;
    min = max = input_array[0];
    for (i = 1; i < count; i++) {
        if (min > input_array[i]) min = intput_array[i];
        if (max < input_array[i]) max = intput_array[i];
    }
    printf("min value is %d, max value is %d\n", min, max);
    if (min >= SCHAR_MIN && max <= SCHAR_MAX)
        nc += printf("type signed char is appropriate\n");
    if (min >= 0 && max <= UCHAR_MAX)
        nc += printf("type unsigned char is appropriate\n");
    if (min >= SHRT_MIN && max <= SHRT_MAX)
        nc += printf("type short is appropriate\n");
    if (min >= 0 && max <= USHRT_MAX)
        nc += printf("type unsigned short is appropriate\n");
    if (nc == 0)
        printf("no type smaller than int is appropriate\n");
    return 0;
}

You can use the same approach for a set of numbers with values unknown at compile time with these steps:

  • start with an allocated array of a small type such as signed char.
  • read the next value: if it fits in the current type, add it to the array and continue.
  • if not, allocate an array of a larger type such as short, copy the values parsed so far into it, free the previous array, store the new value and continue.
  • if the new value does not fit in a short, use a larger type such as int.
  • you could write code for even larger types such as long and long long, but you need specific code for each type.
  • at the end of the read phase, you have an array of the smallest type that handles all the values in the dataset. Handle this array with code for its specific type. This means you have to duplicate the processing code for each type, which can be tricky.

1 Comment

Thanks. you given some hints which is more than enough for me. I'll do it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.