3

I am trying to make a small fixed-point math library. My fixed point numbers are 32-bit, with 16 bits each for the integral and fractional parts. The trouble comes with adding fixed-point numbers and then seeing the resulting value. The function fixed_from_parts below takes an integral and fractional part, and emits a fixed-point number, so fixed_from_parts(5, 2) would equal 0000000000000101.0000000000000010.

When adding two numbers, as seen in the main function below, it seems that the integral parts are added as one number, and the fractional part is added as another (5.2 + 3.9 incorrectly becomes 8.11, because 5 + 3 == 8 and 2 + 9 == 11). I think that I need to reverse the order of the bits stored in the fractional part, but I'm not quite sure how to do that. Am I overcomplicating this? How do I make addition work correctly?

#include <stdint.h>
#include <stdio.h>

typedef int16_t integral_t;
typedef int32_t fixed_t;

fixed_t int_to_fixed(const integral_t x) {
    return x << 16;
} 

integral_t fixed_to_int(const fixed_t x) {
    return x >> 16;
}

// shifts right (clears integral bits), and then shifts back
integral_t get_fixed_fractional(const fixed_t x) {
    return (integral_t) x << 16 >> 16;
}

// fixed_from_parts(5, 2) == 5.2
fixed_t fixed_from_parts(const integral_t integral, const integral_t fractional) {
    return int_to_fixed(integral) + fractional;
}

void print_fixed_base_2(const fixed_t x) {
    for (int i = (sizeof(fixed_t) << 3) - 1; i >= 0; i--) {
        putchar((x & (1 << i)) ? '1' : '0');
        if (i == sizeof(fixed_t) << 2) putchar('.');
    }
    putchar('\n');
}

void print_fixed_base_10(const fixed_t x) {
    printf("%d.%d\n", fixed_to_int(x), get_fixed_fractional(x));
}

int main(void) {
    // 5.2 + 3.9 = 9.1
    const fixed_t a = fixed_from_parts(5, 2), b = fixed_from_parts(3, 9);

    print_fixed_base_2(a);
    print_fixed_base_2(b);

    const fixed_t result = a + b;

    print_fixed_base_2(result);
    print_fixed_base_10(result); // why is the result 8.11?
}
7
  • 2
    You need to understand how fixed-point works. Let's use fewer bits, say 8 bits with 4 bits for the integer and 4 bits for the fraction. Then the number 0101.0010 is not 5.2. It is 5 + 2/16 which in decimal is 5.125. The bit weights (assuming unsigned numbers) are 8, 4, 2, 1, 0.5, 0.25, 0.125, 0.0625. So 0101.0010 is 4 + 1 + 0.125 = 5.125. Commented Sep 29, 2021 at 21:08
  • 1
    fixed_from_parts(5, 2) is not 5.2, it is rather 5 + 2/(2^16) Commented Sep 29, 2021 at 21:08
  • 1
    You need to be more careful defining how your fixed-point format actually works. Storing 5.1 as 0x00050001 can probably be made to work, but let's think a little: If 5.1 is 0x00050001, does that mean 5.10 is 0x0005000a? But 5.1 should be the same as 5.10! What about 5.11? Is that 0x0005000b, or something else? And what about 5.01, or 5.001? Commented Sep 29, 2021 at 21:42
  • 1
    16 bits can comfortably store up to 9999, so you probably want to say that the fraction is in ten thousandths. So both 5.1 and 5.10 would be (5 << 16) | 1000, and you could represent 5.0001 as 0x50001. Commented Sep 29, 2021 at 21:44
  • 1
    But the other thing is that if you do it this way, after doing an addition, you're going to have to manually implement a carry from the fractional part to the integral part. It was the lack of a carry that caused you to get the wrong answer: you had 5.2 + 3.9 wrongly coming out as 8.11 because, as you saw, 2+9=11. After adding, you need to look at the sum and, if it corresponds to a fraction greater than 1, implement a carry into the integral part. Commented Sep 29, 2021 at 21:48

2 Answers 2

1

Your one is not a fixed point.

Example:

#define MULT    (1 << 16)

#define MAKE_FIXED(d)  ((int32_t)(d * MULT))
#define MAKE_REAL(f)   (((double)(f)) / MULT)

int32_t mulf(int32_t a, int32_t b)
{
    int64_t part = (int64_t)a * b;
    return part/MULT;
}

int32_t divf(int32_t a, int32_t b)
{
    int64_t part = ((int64_t)a * MULT) / b;
    return part;
}


int main(void)
{
    int32_t num1 = MAKE_FIXED(5.2);
    int32_t num2 = MAKE_FIXED(3.9);


    printf("%f\n", MAKE_REAL(num1 + num2));
    int32_t result = mulf(num1, num2);
    printf("%f\n", MAKE_REAL(result));
    result = divf(num1,num2);
    printf("%f\n", MAKE_REAL(result));
}
Sign up to request clarification or add additional context in comments.

Comments

1

There are multiple problems in your code:

  • the function get_fixed_fractional has undefined behavior: to get rid of the integral part, you shift it out with << 16 which may cause arithmetic overflow. Furthermore, the type integral_t is signed whereas the fractional part should be unsigned. You should just mask the high bits and return a fixed_t:

    // clear the integral bits
    fixed_t get_fixed_fractional(fixed_t x) { return x & 0xFFFF; }
    
  • you print the fractional part with %d, but it produces misleading output: fixed_from_parts(5, 2) is printed as 5.2 but the value is 5.000030517578125, which you could round as 5.00003. The code to print a fixed_t should be:

    void print_fixed_base_10(const fixed_t x) {
        printf("%d.%05lld\n",
               fixed_to_int(x),
               (get_fixed_fractional(x) * 100000LL + 32768) / 65536);
    }
    

Here is a modified version:

#include <stdint.h>
#include <stdio.h>

typedef int16_t integral_t;
typedef int32_t fixed_t;

fixed_t int_to_fixed(integral_t x) {
    return x << 16;
}

integral_t fixed_to_int(fixed_t x) {
    return x >> 16;
}

// clear the integral bits
integral_t get_fixed_fractional(fixed_t x) {
    return (integral_t)(x & 0xFFFF);
}

// fixed_from_parts(5, 2) == 5.2
fixed_t fixed_from_parts(integral_t integral, integral_t fractional) {
    return int_to_fixed(integral) + fractional;
}

void print_fixed_base_2(fixed_t x) {
    for (int i = 32; i-- > 0;) {
        putchar((x & ((uint32_t)1 << i)) ? '1' : '0');
        if (i == 16)
            putchar('.');
    }
    putchar('\n');
}

void print_fixed_base_10(fixed_t x) {
    printf("%d.%05lld\n",
           fixed_to_int(x),
           (get_fixed_fractional(x) * 100000LL + 32768) / 65536);
}

int main(void) {
    // 5.2 + 3.9 = 9.1 (not really)
    const fixed_t a = fixed_from_parts(5, 2), b = fixed_from_parts(3, 9);
    const fixed_t result = a + b;

    print_fixed_base_2(a);
    print_fixed_base_2(b);
    print_fixed_base_2(result);

    print_fixed_base_10(a);
    print_fixed_base_10(b);
    print_fixed_base_10(result);
    return 0;
}

Output:

0000000000000101.0000000000000010
0000000000000011.0000000000001001
0000000000001000.0000000000001011
5.00003
3.00014
8.00017

You might want to pass a third argument to fixed_from_parts to specify the denominator:

// fixed_from_parts(5, 2, 10) == 5.2
fixed_t fixed_from_parts(integral_t integral, unsigned int fractional, unsigned int denominator) {
    return int_to_fixed(integral) + (fixed_t)((fractional * 65536LL + denominator / 2) / denominator);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.