Why do `(char)~0` and `(unsigned char)~0` return values of different widths?

Question

I bumped into this while writing a program trying to print the constituent byte values of UTF-8 characters.

This is the program that I wrote to test the various ~0 operations:

#include <stdio.h>

int main()
{
    printf("%x\n", (char)~0); // ffffffff
    printf("%x\n", (unsigned char)~0); // ff
    printf("%d\n", sizeof(char) == sizeof(unsigned char)); // 1
    printf("%d\n", sizeof(char) == sizeof(unsigned int)); // 0
    printf("%d\n", (char)~0 == (unsigned int)~0); // 1
}

I'm struggling to understand why char would produce an int-sized value, when unsigned char produces a char-sized value.

%x expects an unsigned int. So when you pass -1, it gets converted to the largest unsigned int (on a 2's comp machine). I don't know if that's standard, or just what happens here. Using %hhx would do the right thing. But using an unsigned type would make more sense. — ikegami
– ikegami, Commented Feb 25, 2022 at 18:12
If char is signed, (char)~0 is probably converted to (char)-1. By the default argument promotions, that (char)-1 is converted to (int)-1. — Ian Abbott
– Ian Abbott, Commented Feb 25, 2022 at 18:13
You cannot send a char through to printf(). It is automagically converted to int in the process of calling the function. When char is signed (such as in your implementation), (char)~0 is a negative value. When a negative value is re-interpreted as unsigned int (when printf() processes the "%x") it has a bunch of binary 1s at the most significant bits. — pmg
– pmg, Commented Feb 25, 2022 at 18:13
More accurate version of my earlier comment: %x expects an unsigned int. So the -1 you pass (as an int thanks to integer promotion) gets interpreted as an unsigned int, giving the largest unsigned int on a 2's comp machine. Using %hhx would do the right thing. But using an unsigned type (e.g. unsigned char) would make more sense. — ikegami
– ikegami, Commented Feb 25, 2022 at 18:19
@EricPostpischil ~0 would produce (int)-1 (assuming 2's complement) so would be within the range of a signed char. — Ian Abbott
– Ian Abbott, Commented Feb 25, 2022 at 18:30

dbush · Accepted Answer · 2022-02-25 18:12:43Z

8

When passing a type smaller than int to a variadic function like printf, it get promoted to type int.

In the first case, you're passing char with value -1 whose representation (assuming 2's complement) is 0xff. This is promoted to an int with value -1 and representation 0xffffffff, so this is what is printed.

In the second case, you're passing an unsigned char with value 255 whose representation is 0xff. This is promoted to an int with value 255 and representation 0x000000ff, so this is what is printed (without the leading zeros).

answered Feb 25, 2022 at 18:12

dbush

233k27 gold badges261 silver badges334 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marcus Harrison Over a year ago

When explained like this it makes total sense, it's an arithmetic promotion, not bitwise. I hadn't considered that at all. The signed char -1 is converted to signed int -1 and treated as an unsigned int for printing.

Eric Postpischil · Accepted Answer · 2022-02-25 18:36:39Z

They do not produce values of different widths. They produce values with different numbers of set bits in them.

In your C implementation, it appears int is 32 bits and char is signed. I will use these in this answer, but readers should note the C standard allows other choices.

I will use hexadecimal to denote the bits that represent values.

In (char)~0, 0 is an int. ~0 then has bits FFFFFFFF. In a 32-bit two’s complement int, this represents −1. (char) converts this to a char.

At this point, we have a char with value −1, represented with bits FF. When that is passed as an argument to printf, it is automatically converted to an int. Since its value is −1, it is converted to an int with value −1. The bits representing that int are FFFFFFFF. You ask printf to format this with %x. Technically, that is a mistake; %x is for unsigned int, but your printf implementation formats the bits FFFFFFFF as if they were an unsigned int, producing output of “ffffffff”.

In (unsigned char)~0), ~0 again has value −1 represented with bits FFFFFFFF, but now the cast is to unsigned char. Conversion to an unsigned integer type wraps modulo M, where M is one more than the maximum value of the type, so 256 for an eight-bit unsigned char. Mathematically, the conversion is −1 + 1•256 = 255, which is the starting value plus the multiple of 256 needed to bring the value into the range of unsigned char. The result is 255. Practically, it is implemented by taking the low eight bits, so FFFFFFFF becomes FF. However, in unsigned char, the bits FF represent 255 instead of −1.

Now we have an unsigned char with value 255, represented with bits FF. Passing that to printf results in automatic conversion to an int. Since its unsigned char value is 255, the result of conversion to int is 255. When you ask printf to format this with %x (which is a mistake as above), printf formats it as if the bits were an unsigned int, producing output of “ff”.

Vlad from Moscow · Accepted Answer · 2022-02-25 18:28:00Z

1

In these both calls

printf("%x\n", (char)~0); // ffffffff
printf("%x\n", (unsigned char)~0); // ff

the expressions (char)~0) and (unsigned char)~0) are converted to the type int due to the integer promotions.

In the used system the type char behaves as the type signed char. So the sign bit in this expression is propagated when the expression is promoted to the type int.

On the other hand, before the integer promotions this expression (unsigned char)~0 has the type unsigned char due to the casting to the unsigned type. So neither sign bit is propagated when the expression is promoted to the type int.

Pay attention to that the conversion specifier x is applied to objects of the type unsigned int. So the first call of printf should be written like

printf("%x\n", ( unsigned int )(char)~0);

edited Feb 25, 2022 at 18:28

answered Feb 25, 2022 at 18:17

Vlad from Moscow

313k27 gold badges204 silver badges358 bronze badges

Collectives™ on Stack Overflow

Why do `(char)~0` and `(unsigned char)~0` return values of different widths?

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related