4

There is a code where I enter ABCDEFGH and press enter, and the final result is HGF. When I use debug mode to observe variables. When executing the first sentence input, x='A'. After the next step, x='D', y='C'. After the next step, x='H', y='G', z='F'

#include <stdio.h>

int main() {
    char x, y, z;
    scanf("%2c", &x);
    scanf("%3c", &y);
    scanf("%4c", &z);
    printf("%c%c%c", x, y, z);
    return 0;
}

I am currently very confused as to why this is happening. As far as I know, "%3c" means to read in 3 characters, but only store the first one and discard the last two. I'm not sure if there's a problem with this code. Can you explain why the output is like this, regardless of whether the code was written incorrectly or not?

2
  • 2
    Incorrect format specifier usage In scanf("%2c", &x), the %2c means read 2 characters but x can only store 1 character. The same issue happens for %3c and %4c. This causes buffer overflow (undefined behavior), because you’re asking scanf() to write multiple characters into a single char variable. If you actually want to read 2, 3, and 4 characters respectively, you need to use character arrays (strings): char x[3], y[4], z[5]; // +1 for null terminator '\0' Commented Oct 11 at 16:02
  • 3
    @RajkumarPrajapati "+1 for null terminator" --> yet char x[3]; scanf("%2c", &x) does not assign a null terminator. Commented Oct 11 at 16:28

4 Answers 4

8

As explained by @Someprogrammerdude, the code has undefined behavior because you are attempting to store multiple bytes into memory but pass the addresses of single byte variables. Don't take offense for this mistake, scanf is one the most misunderstood functions in the C library, many programmers if not most get it wrong.

Here is a modified version that does not have undefined behavior:

#include <stdio.h>

int main(void) {
    char x[2], y[3], z[4];
    if (scanf("%2c%3c%4c", x, y, z) != 3)
        return 1;
    printf("%c%c%c\n", *x, *y, *z);
    return 0;
}

Input: ABCDEFGH and Enter
Output: ACF

Here is an alternative with the same output using the * modifier that prevents the storage of the conversion results:

#include <stdio.h>

int main(void) {
    char x, y, z;
    if (scanf("%c%*c%c%*2c%c%*3c", &x, &y, &z) != 3)
        return 1;
    printf("%c%c%c\n", x, y, z);
    return 0;
}
Sign up to request clarification or add additional context in comments.

2 Comments

I've got more than 30 years of experience with C and I still didn't see the mistake in OP's code. I'm adding this to my list of reasons why the scanf family shouldn't be used at all.
7

It seems that the used compiler placed the local variables x, y, z in memory in the reverse order relative to their declarations. That is the variable z in memory is followed by the variable y that in turn is followed by the variable x.

Now consider the first call of scanf

scanf("%2c",&x);

According to the description of the function fscanf (that is valid for the function scanf) in the C Standard the following occurs

c Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive).

where c is the conversion specifier used in your call of scanf.

That is the function tries to read exactly 2 characters from the input buffer to the memory starting with the address specified by the expression &x.

Thus two characters are read from the input buffer. The first character 'A' is written in the variable x and the second character 'B' overwrites the memory used by the program after the variable x that results in undefined behavior.

In the next step

scanf("%3c",&y);

next three characters are read from the input buffer. The first character 'C' is stored in the memory of the variable y, the next character 'D' is stored in the memory occupied by the variable x and the third character again overwrites the memory after the variable x.

In the third and last step

scanf("%4c",&z);

four characters are read from the input buffer. The variable z gets the character 'F', the variable y that follows in memory the variable z gets the character 'G' and the variable x that follows the variable y gets the character 'H'. The fourth character of the input buffer that represents the new line character '\n' that corresponds to the pressed key Enter again overwrites memory.

The program could have a valid behavior if you declared a character array that can accommodate four characters from the input buffer.

For example

    enum { N = 4 };
    char s[N];
    scanf("%2c", s);
    scanf("%3c", s);
    scanf("%4c", s);

    printf( "%.*s", N, s );

2 Comments

As "%.*s" matches an int and char *, is enum { N = 4 } always compatible with an int?
From the C24 Standard: "15 The enumeration member type for an enumerated type without fixed underlying type upon completion is: — int if all the values of the enumeration are representable as an int; or,..."
5

You tell scanf that is should read two or more characters from the input, and then store them into a one-character variable. That leads to undefined behavior.

From this scanf (and family) reference regarding the c format:

If a width specifier is used, matches exactly width characters (the argument must be a pointer to an array with sufficient room).

[Emphasis mine]


As a somewhat related tip, you should always check what scanf returns.

Comments

3

As far as I know, "%3c" means to read in 3 characters, but only store the first one and discard the last two.

This statement is incorrect. According to the documentation of scanf, the %3c conversion format specification will read and store 3 characters at the address specified by the pointer argument.

If you want to read 3 characters, but only want to store the first character and discard the next two characters, then you should be using "%c%*2c" instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.