35

I was helping a student with his assignment and ran into a very gnarly "bug" which I could not explain, and was wondering if I could get help in understanding what exactly is going on.

The question was to implement the following:

given a byte array (buf) of length N, calculate the 16-bit checksum given by the formula:

checksum = ~(buf[0]) + ~(buf[1]) + ... + ~(buf[N-2]) + ~(buf[N-1])

The implementation the student made was pretty straightforward:

uint16_t calculate_checksum(uint8_t *msg_buf, size_t msg_length)
{
    uint16_t checksum = 0;
    for (size_t i = 0; i < msg_length; i++)
    {
        checksum += ~(msg_buf[i]);
    }
    return checksum;
}

However, to my surprise, this implementation did not give the expected result.

I've tried looking at the checksum variable by printing its value in the loop, which yielded an unexpected pattern:

65471
65459
65449
65287
65276
65253
65166
65113
65094
...

The checksum starts at 0, but after the first addition the value sits somewhere in the upper range of the uint16_t and going down. I suspected the value underflowing due to msg_buf[i] being implicitly converted into a uint16_t before complementing it. But I don't understand why. I would expect that first msg_buf gets indexed with i, then the byte complement is calculated (which would yield a value within 0-255), only then is it (implicitly) promoted into a uint16_t.

I've tried to look at the assembly output of the statement checksum += ~(msg_buf[i]), which seems to support this theory (using ARM gcc 14.1.0). Note that [r7, #4] is the msg_buf pointer, [r7, #8] is i and [r7, #14] is checksum. The assembly does something weird with subtraction, which yields the same result if you would convert the msg_buf[i] byte to a uint16_t before complementing.

ldr     r2, [r7, #4]
ldr     r3, [r7, #8]
add     r3, r3, r2
ldrb    r3, [r3]        @ zero_extendqisi2
mov     r2, r3
ldrh    r3, [r7, #14]   @ movhi
subs    r3, r3, r2
uxth    r3, r3
subs    r3, r3, #1
strh    r3, [r7, #14]   @ movhi

Knowing this, the fix we came up with is pretty straightforward. We opted to just AND the results of the right-hand side with 0xFF to get rid of the higher bits, which yielded the correct checksum.

So, the issue is essentially fixed, but I still don't understand why this is an issue. Maybe this behaviour is expected and I don't know the proper operation orders, or maybe something else is happening. I really don't know.

Can someone please explain why this is happening?

6
  • Can you provide sample input & its expected output? Commented Aug 16, 2024 at 19:41
  • 12
    ~(buf[0]) promotes buf[0] to an int and then performs the bitwise complement. If the checksum is supposed to be calculated with eight-bit complements, use (uint8_t) ~ (unsigned) buf[0]. ((uint8_t) ~buf[0] usually suffices, but there is a hypothetical failure in it in a C implementation that uses one’s complement or sign-and-magnitude.) Commented Aug 16, 2024 at 19:41
  • see stackoverflow.com/q/46073295/1216776 Commented Aug 16, 2024 at 19:48
  • 15
    @elialm, Tip: when debugging bit related problems, print your debug info in hex. FFBF, not 65471. Commented Aug 16, 2024 at 20:48
  • @EricPostpischil: The mandated support for uint_least64_t in C99 made it unsupportable on sign-magnitude and ones'-complement hardware. A compiler that's almost C99 compatibile was produced for the ones'-complement Univac, but it lacks that type and is thus not a full C99 compiler. A more dangerous issue is that if one fails to specify the -fwrapv flag when using gcc, evaluation of uint16a*uint16b when uint16a exceeds INT_MAX/uint16b will sometimes disrupt the behavior of the surrounding code in ways that may arbitrarily corrupt memory. Commented Aug 17, 2024 at 19:50

3 Answers 3

32

What happened here is a result of integer promotions.

In most cases where a type smaller than int is used in an expression, it is promoted to type int. This is spelled out in section 6.3.1.1p2 of the C standard:

The following may be used in an expression wherever an int or unsigned int may be used:

  • An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
  • A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions

And section 6.5.3.3p4 regarding unary arithmetic operators states the following regarding the ~ operator:

The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.

So on this statement:

checksum += ~(msg_buf[i]);

The value of msg_buf[i] is promoted to type int before the ~ operator is applied. Assuming a 32-bit int, this int value will have all zeros for the upper 3 bytes. So when the operator is applied, those 3 bytes will have all bits set to 1. Then when that value is added to checksum which has type uint16_t, the lower 16 bits are kept where the upper 8 of those bits are all set to 1.

For example, if the value of msg_buf[i] was 0x33, it would first get promoted to the int value 0x00000033. Then after applying the ~ operator the result would be 0xffffff77. This value gets added to the current value of checksum as an int, then the result is truncated to a uint16_t before being assigned.

After applying the ~ operator, the result needs to first get reduced back to an 8 bit value with either a cast:

checksum += (uint8_t)(~msg_buf[i]);

Or a bitmask:

checksum += ~msg_buf[i] & 0xff;
Sign up to request clarification or add additional context in comments.

Comments

19

… I would expect that first msg_buf gets indexed with i, then the byte complement is calculated (which would yield a value within 0-255), only then is it (implicitly) promoted into a uint16_t.

C 2018 6.5.3.3 4 specifies, for the ~ operator, “The integer promotions are performed on the operand…” The integer promotions on a uint8_t promote it to int.

So ~(msg_buf[i]) performs the bitwise complement on an int. This will set the high bits to ones. To compute an eight-bit complement, you can use (uint8_t) ~msg_buf[i] or ~msg_buf[i] & 0xFF.

For ordinary situations, the code checksum += (uint8_t) ~msg_buf[i]; will suffice. There are largely theoretical situations allowed by the C standard in which the arithmetic could overflow or the conversion to uint8_t could produce a different value than desired, but they do not occur in ordinary C implementations.

Comments

13

As a supplement to the other answers, it can be mentioned that an alternative solution is the following:

    checksum += msg_buf[i] ^ 0xFF;

The reason is that XOR'ing with 0xFF only flips the lower 8 bits regardless of msg_buf[i] being promoted.

5 Comments

Nice for golfing, but obscures the intent of negation for normal development.
@Ruslan Thanks - in this case where the goal is formulated by negation, you may be right, but it is also worth to consider that this solution avoids the pitfall leading to this question. In general, I think whether a given construct is clear or obscure depends much on the situation and habits.
@Ruslan How about 0xFF - msg_buf[i]?
@Neil using anything except bitwise NOT will obscure the intent of NOTting. If you're trying to negate a number and cut off the higher bytes, the easiest to understand is to write exactly this: ~msg_buf[i] & 0xFF, or use a cast instead of ANDing in the end. The compiler will be able to optimize this to the same machine code. And if you just do your subtraction or nielsen's XOR without leaving a comment, you not only use a strange-looking construction for a simple operation, but also leave a room for mistake by a subsequent maintainer who doesn't notice the pitfall of sign extension.
something ^ 0xFF is very clear IMO. 0xFF - something on the other hand, well most people don't really understand subtraction enough to get what that does.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.