0

I typed the following C code:

typedef unsigned char* byte_pointer;

void show_bytes(byte_pointer start, size_t len) 
{
  int i;
  for (i = 0; i < len; i++)
  printf(" %.2x", start[i]);
  printf("\n");
}

 void show_float(float x) 
  {
       show_bytes((byte_pointer) &x, sizeof(float));
  }
 void main()
    {
         float f = 3.9;
         show_float(f);
    }

The output of this code is: Output: 0x4079999A

Manual Calculations 1.1111001100110011001100 x 2 power 1

M: 1111001100110011001100

E: 1 + 127 = 128(d) = 10000000(b)

Final Bits: 01000000011110011001100110011000

Hex: 0x40799998

Why this last A is displayed despite of 8.

7
  • 3
    What happens if you have float f = 3.9f; ? Your 3.9 is a double value. Note that 3.9 cannot be converted exactly to finite binary floating point representation, so the implementation details will cover whether the approximation is a round-up or a round-down. Commented Nov 4, 2021 at 10:42
  • 3
    Better not to typedef pointers. Commented Nov 4, 2021 at 10:51
  • @WeatherVane: it should not change much. The truncation to float of the double 3.9 value is likely to be the float value 3.9f. I would assume that the representation in multi-precision should be 0x4079999999999.... And that is normally rounded to the closer representable floating point value which is 0x4079999A. The question is which manual calculation could lead to 0x40799998... Commented Nov 4, 2021 at 10:53
  • @SergeBallesta yes in practice I found no difference. A deleted comment queried the basis of the manual calculation ;) Commented Nov 4, 2021 at 10:56
  • 1
    @WeatherVane: Just FYI, the shortest decimal numeral that differs when converted first to a double and then a float has seven significant digits, and there is only one of them, except for the sign. Commented Nov 4, 2021 at 11:53

1 Answer 1

1

As per manual calculations the answer in Hex should supposed to be: Output: 0x40799998

Those undisclosed manual calculations must be wrong. The correct result is 4079999A16.

In the format commonly used for float, IEEE-754 binary32 or “single precision,” numbers are represented as an integer with magnitude less than 224 multiplied by a power of two within certain limits. (The floating-point representation is often described in other forms, such as sign, a 24-digit binary significand with radix point after the first digit, and a power of two. These forms are mathematically equivalent.)

The two numbers in this form closest to 3.9 are 16,357,785•2−23 and 16,357,786•2−23. These are, respectively, 3.8999998569488525390625 and 3.900000095367431640625. Lining them up, we can see the latter is closer to 3.9:

3.8999998569488525390625
3.9000000000000000000000
3.9000000953674316406250

as the former differs by 1.5 at the seventh digit after the decimal point, whereas the latter differs by about 9.5 at the eighth digit after the decimal point.

Therefore, the best conversion of 3.9 to this float format produces 16,357,786•2−23. In hexadecimal, 16,357,786 is F9999A16. In the encoding of the representation into the bits of a float, the low 23 bits of the significand are put into the primary significand field. The low 23 bits are 79999A16, and that is what we should see in the primary significand field.

Also note we can easily see the binary for 3.9 is 11.11100110011001100110011001100110011001100110…2. The bold marks the 24 bits that fit in the float significand. Immediately after them is 1001…, which we can see ought to round up, since it exceeds half of the previous bit, and therefore the last four bits of the significand should be 1010.

(Also note that good C implementations convert numerals in source text to the nearest representable number, especially for numbers without many decimal digits, but the C standard does not require this. It says “For decimal floating constants, and also for hexadecimal floating constants when FLT_RADIX is not a power of 2, the result is either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner. However, the encoding shown in the question 4079999816, is not for either of the adjacent representable values, 4079999916 and 4079999A16. It is farther away than either.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.