I was helping a student with his assignment and ran into a very gnarly "bug" which I could not explain, and was wondering if I could get help in understanding what exactly is going on.
The question was to implement the following:
given a byte array (
buf) of length N, calculate the 16-bit checksum given by the formula:checksum = ~(buf[0]) + ~(buf[1]) + ... + ~(buf[N-2]) + ~(buf[N-1])
The implementation the student made was pretty straightforward:
uint16_t calculate_checksum(uint8_t *msg_buf, size_t msg_length)
{
uint16_t checksum = 0;
for (size_t i = 0; i < msg_length; i++)
{
checksum += ~(msg_buf[i]);
}
return checksum;
}
However, to my surprise, this implementation did not give the expected result.
I've tried looking at the checksum variable by printing its value in the loop, which yielded an unexpected pattern:
65471
65459
65449
65287
65276
65253
65166
65113
65094
...
The checksum starts at 0, but after the first addition the value sits somewhere in the upper range of the uint16_t and going down. I suspected the value underflowing due to msg_buf[i] being implicitly converted into a uint16_t before complementing it. But I don't understand why. I would expect that first msg_buf gets indexed with i, then the byte complement is calculated (which would yield a value within 0-255), only then is it (implicitly) promoted into a uint16_t.
I've tried to look at the assembly output of the statement checksum += ~(msg_buf[i]), which seems to support this theory (using ARM gcc 14.1.0). Note that [r7, #4] is the msg_buf pointer, [r7, #8] is i and [r7, #14] is checksum. The assembly does something weird with subtraction, which yields the same result if you would convert the msg_buf[i] byte to a uint16_t before complementing.
ldr r2, [r7, #4]
ldr r3, [r7, #8]
add r3, r3, r2
ldrb r3, [r3] @ zero_extendqisi2
mov r2, r3
ldrh r3, [r7, #14] @ movhi
subs r3, r3, r2
uxth r3, r3
subs r3, r3, #1
strh r3, [r7, #14] @ movhi
Knowing this, the fix we came up with is pretty straightforward. We opted to just AND the results of the right-hand side with 0xFF to get rid of the higher bits, which yielded the correct checksum.
So, the issue is essentially fixed, but I still don't understand why this is an issue. Maybe this behaviour is expected and I don't know the proper operation orders, or maybe something else is happening. I really don't know.
Can someone please explain why this is happening?
~(buf[0])promotesbuf[0]to anintand then performs the bitwise complement. If the checksum is supposed to be calculated with eight-bit complements, use(uint8_t) ~ (unsigned) buf[0]. ((uint8_t) ~buf[0]usually suffices, but there is a hypothetical failure in it in a C implementation that uses one’s complement or sign-and-magnitude.)FFBF, not65471.uint_least64_tin C99 made it unsupportable on sign-magnitude and ones'-complement hardware. A compiler that's almost C99 compatibile was produced for the ones'-complement Univac, but it lacks that type and is thus not a full C99 compiler. A more dangerous issue is that if one fails to specify the-fwrapvflag when using gcc, evaluation ofuint16a*uint16bwhenuint16aexceedsINT_MAX/uint16bwill sometimes disrupt the behavior of the surrounding code in ways that may arbitrarily corrupt memory.