15

I have a very simple C program where I am printing variables of different sizes.

#include <stdio.h>

unsigned int long long a;
unsigned int c;

int main() {
    a = 0x1111111122222222;
    c = 0x33333333;
    printf("Sizes: %zu %zu\n", sizeof(a), sizeof(c));
    printf("Seg: %llx %x\n", a, c);
    printf("Seg: %lx %x\n", a, c);
    printf("Seg: %x\n", c);
    return 0;
}

On a 64-bit machine, all works fine. On a 32-bit machine though, if I use an incorrect formatter for the first argument (second printf), I get incorrect output for second argument too. Is that because of how varargs are processed? What am I missing?

Output

Sizes: 8 4
Seg: 1111111122222222 33333333
Seg: 22222222 11111111
Seg: 33333333
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, not stripped

Command used

rm ./a.out ; g++ -m32 test.cpp ; ./a.out ; file a.out
7
  • 1
    Nice breakdown! The mismatch in printf format specifiers on 32-bit makes total sense now. I faced similar issues while testing apps across platforms—tools like vedoapk.com also helped me with cross-compatibility checks. Commented Oct 1 at 6:20
  • 16
    Undefined behavior means that the behavior is undefined... Commented Oct 1 at 6:31
  • 1
    You don't say what compiler you are using, but GCC (for instance) will pick up such errors at compile time if you use -Wall to report all warnings. Commented Oct 2 at 10:00
  • The second printf looks right to me, do you mean the third printf? Commented Oct 2 at 12:42
  • @TonyK: The command line with "g++" would indicate GCC (or perhaps Clang in disguise) Commented Oct 2 at 23:04

3 Answers 3

23

The second printf where the conversion format does not match the actual type passed has undefined behavior, so the C Standard imposes no requirements and anything you observe is possible, including expected behavior, as you observe on 64-bit targets and different behavior as you want to analyse on your 32-bit target.

Here is a tentative explanation (assuming little endian and stack based ABI):

  • on your 32-bit target, vararg arguments are passed on the stack, 32-bit aligned
  • passing an unsigned long long uses 2 stack slots in little endian order
  • passing an unsigned int uses a single slot
  • on this target, unsigned long, expected by printf for %lx has 32 bits (as is the case on most targets), so it uses a single slot
  • hence in this target, the first number is retrieved from the first slot and the value is the low 32-bits of the 64-bit value passed in little endian order
  • the second number is retrieved from the second slot, where the most significant 32-bits have been stored by the caller.
  • this explains the output: Seg: 22222222 11111111

Remember that undefined behavior is by definition not defined, so other side effects may happen and something completely different may be output, even on the same host with the same binary.

As commented by @GlennWillen, in particular, you cannot count on the damage being limited to a particular line of code, or function, or file; and you cannot count on it being limited to consequences which would actually be possible in any straightforward interpretation of the code; and you cannot count on it being limited to things happening chronologically after the UB in many cases. If the code could execute UB when some condition is true, the compiler may act as though that condition is false everywhere and for all purposes, even if this is absurd or contradictory.

Here are some take-aways:

  • always turn on extra warnings so this kind of bug is reported by the compiler (-Wall -Wextra or similar).
  • fix the code to compile without any warnings (adding -Werror will force this) and think twice before using casts to silence a warning.
  • use the formatting macros from <inttypes.h> for types defined in <stdint.h>
Sign up to request clarification or add additional context in comments.

8 Comments

Re “anything can happen”: It is not possible for anything to happen simply because somebody passes an incorrect argument type. Correct phrasing is that the C standard imposes no requirements. There are other factors that impose requirements, so it is not true that anything can happen.
Technically, a purposely perverse environment hosted on a DS9K could detect this UB and do just about anything or at least try, while staying within the requirements of the C Standard. Some very desirable outcomes such as curing all illnesses and making everyone rich and happy are highly unlikely though. I shall rephrase this a bit.
When the C standard imposes no requirements, it is typical for C compilers (especially Clang) to consider this license for arbitrary behavior. I'm not sure what other factors you're referring to. Certainly the Clang developers will not actually reformat your hard drive or send threatening emails from your computer when encountering undefined behavior, since criminal law is independent of the C standard. But for example, undefined behavior anywhere in your C program may cause Clang to behave quite strangely when compiling any other code elsewhere in the same compilation unit.
Re “When the C standard imposes no requirements, it is typical for C compilers (especially Clang) to consider this license for arbitrary behavior”: No, it is not. The behavior of a program that contains #include <fcntl.h> and calls open is not defined by the C standard (the C standard imposes no requirements on it), but compilers do not consider that a license for arbitrary behavior. Re “… since criminal law need not respect the C standard”: So that is another factor imposing requirements. There are others. Therefore it is not true that anything can happen. The statement is simply false.
(In particular, you cannot count on the damage being limited to a particular line of code, or function, or file; and you cannot count on it being limited to consequences which would actually be possible in any straightforward interpretation of the code; and you cannot count on it being limited to things happening chronologically after the UB in many cases. If the code could execute UB when some condition is true, the compiler may act as though that condition is false everywhere and for all purposes, even if this is absurd or contradictory.)
When the Standard was written, the authors wanted to avoid imposing requirements which would be upheld by some implementations, or even the vast majority of them, but which could not practically be upheld by all. They used "Undefined Behaivor" as a catch-all that included such constructs. The intention was to give programmers a "fighting chance" to write code that would be compatible with even unusual implementations, not to demean code that relied upon commonplace behaviors. It has become fashionable, however, for people to lie about the documented intentions of the Standard, however.
I personally make no claims whatsoever about the original intent of the standard. But if you're compiling code using compilers that exist in 2025, you will do yourself no favors if you ignore their interpretation of the standard, even if you think that interpretation is crazy.
"fix the code to compile without any warnings" << I find -Werror helpful for that!
8

By passing the incorrect formatter, you lied to printf.

printf does not know the types or sizes of the arguments you actually pass to it; it just sees an undifferentiated sludge of bytes on the call stack (or in whatever registers). After the format string, it doesn't know where one argument stops and the next one starts. It only knows what to expect based on the format string.

In the incorrect printf call, you told it that the first argument was sizeof (unsigned long int) bytes, so it read that many bytes and did the appropriate conversion. Then it read the next sizeof (unsigned long int) bytes and converted that for the next output. It doesn't know it was reading two halves of a single object.

3 Comments

Most calling conventions have a minimum arg-passing slot size equal to one register (so usually sizeof(int*) or sizeof(size_t), but ILP32 ABIs on 64-bit ISAs are an exception to that, sill using 64-bit slots even though pointers and size_t are 32-bit). So passing uint32_t arg to a variadic function expecting uint64_t is harmless in normal 64-bit calling conventions on most ISAs, except for high garbage in that arg itself. In the OP's case (32-bit ISA), both args are at least 1 full slot, and the large is 2 slots, so your answer in terms of counting bytes works, too.
It's a shame the Standard didn't recognize a category of implementation where values of any integer type may be consumed by arguments of any integer type if they are within the argument range, and where matching-sized integer types are alias compatible. Such implementations could have facilitated compatibilty of code with systems having various sets of integer sizes. If e.g. one has functions that are specified as accepting arguments of types unsigned*, and unsigned long long*, and one has arrays of types uint32_t[] and uint64_t[], one should be able to...
...pass the uint32[] to a function that accepts unsigned* if that's the right size, and pass the uint64_t[] to the one that takes unsigned long long* if that's the right size, but either of those arrays could be a type alias for unsigned long*, which at present would be simultaneously incompatible with both functions.
7

Whether you're compiling your simple C program with a C or C++ compiler, using the incorrect format specifier leads to undefined behavior. This may include apparently correct code, and it's possible that is the more dangerous possibility.

It's best not to spend too much time reasoning about the results you're seeing1 as they will depend on platform and compiler specific details and instead focus on using the correct specifiers to avoid this situation.

You may wish to include <stdint.h> and2 <inttypes.h> (or in C++ the wrapped <cstdint> and <cinttypes> headers) which provides typedefs for specific size integers, signed or unsigned. They also provide macros for ensuring proper format specifiers. For instance, unsigned long long int a; becomes uint64_t a; and printf("Seg: %llx %x\n", a, c); becomes printf("Seg: %" PRIx64 " %x\n", a, c);

Alternatively, your question does indicate that you're using a C++ compiler, so if you can write a C++ program, you can use I/O streams and insertion operators, or std::format. Either is type-safe and sidesteps/solves this issue.


1 Unless you need to understand why a specific unexpected behavior is happening in a very specific operating environment as part of a debugging effort, especially with code that you cannot rewrite to be UB-free.

2 <inttypes.h> is documented to include <stdint.h>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.