2

I'm very new to C and am a bit confused as to when we need to manually add the terminating '\0' character to strings. Given this function to calculate string length (for clarity's sake):

int stringLength(char string[])
{
    int i = 0;
    while (string[i] != '\0') {
        i++;
    }

    return i;
}

which calculates the string's length based on the null terminating character. So, using the following cases, what is the role of the '\0' character, if any?

Case 1:

char * stack1 = "stack";
printf("WORD %s\n", stack1);
printf("Length %d\n", stringLength(stack1));

Prints:

WORD stack
Length 5

Case 2:

char stack2[5] = "stack";
printf("WORD %s\n", stack2);
printf("Length %d\n", stringLength(stack2));

Prints:

WORD stack���
Length 8

(These results vary each time, but they are never correct).

Case 3:

char stack3[6] = "stack";
printf("WORD %s\n", stack3);
printf("Length %d\n", stringLength(stack3));

Prints:

WORD stack
Length 5

Case 4:

char stack4[6] = "stack";
stack4[5] = '\0';
printf("WORD %s\n", stack4);
printf("Length %d\n", stringLength(stack4));

Prints:

WORD stack
Length 5

Case 5:

char * stack5 = malloc(sizeof(char) * 5);
if (stack5 != NULL) {
    stack5[0] = 's';
    stack5[1] = 't';
    stack5[2] = 'a';
    stack5[3] = 'c';
    stack5[4] = 'k';
    printf("WORD %s\n", stack5);
    printf("Length %d\n", stringLength(stack5));
}
free(stack5);

Prints:

WORD stack
Length 5

Case 6:

char * stack6 = malloc(sizeof(char) * 6);
if (stack6 != NULL) {
    stack6[0] = 's';
    stack6[1] = 't';
    stack6[2] = 'a';
    stack6[3] = 'c';
    stack6[4] = 'k';
    stack6[5] = '\0';
    printf("WORD %s\n", stack6);
    printf("Length %d\n", stringLength(stack6));
}
free(stack6);

Prints:

WORD stack
Length 5

Namely, I would like to know the difference between cases 1, 2, 3, and 4 (also why the erratic behavior of case 2 and no need to specify the null-terminating character in 1 and 3. Also, how 3 and 4 both work the same?) and how 5 and 6 print out the same thing even though not enough memory is allocated in case 5 for the null-terminating character (since only 5 char slots are allocated for each letter in "slack", how does it detect a '\0' character, i.e., the 6th character?)

I'm so sorry for this absurdly long question. It's just I couldn't find a good didactic explanation on these specific instances anywhere else.

3
  • 1
    Very broadly, if you have a character string stored in an array, then you must have some way of knowing where the string ends. The two most obvious ways are (1) to keep a separate character count, or (2) to terminate the string with some unique character (e.g. '\0'). Option 2 seems to be the most common method today, and C automatically terminates string constants with '\0'. The standard C libraries also expcet '\0'-terminated strings. Commented Sep 17, 2017 at 6:04
  • @TomKarzes: "C automatically terminates string constants with '\0'" well, just string-literals. Commented Sep 17, 2017 at 13:55
  • Do not try to learn C by Trial&Error, as this is know to cause depressions. Commented Sep 17, 2017 at 13:57

3 Answers 3

7

The storage for a string must always leave room for the terminating null character. In some of your examples you don't do this, explicitly giving a length of 5. In those cases you will get undefined behavior.

String literals always get the null terminator automatically. Even though strlen returns a length of 5, it is really taking 6 bytes.

Your case 5 only works because undefined sometimes means looking like it worked. You probably have a value of zero following the string in memory, but you can't rely on that.

Sign up to request clarification or add additional context in comments.

Comments

6

In case 1, you are creating a string literal (a constant which will be on read only memory) which will have the \0 implicitly added to it.

Since \0's position is relied upon to find the end of string, your stringLength() function prints 5.

In case 2, you are trying to initialise a character array of size 5 with a string of 5 characters leaving no space for the \0 delimiter. The memory adjacent to the string can be anything and might have a \0 somewhere. This \0 is considered the end of string here which explains those weird characters that you get. It seems that for the output you gave, this \0 was found only after 3 more characters which were also taken into account while calculating the string length. Since the contents of the memory change over time, the output may not always be the same.

In case 3, you are initialising a character array of size 6 with a string of size 5 leaving enough space to store the \0 which will be implicitly stored. Hence, it will work properly.

Case 4 is similar to case 3. No modification is done by

char stack4[5] = '\0';

because size of stack4 is 6 and hence its last index is 5. You are overwriting a variable with its old value itself. stack4[5] had \0 in it even before you overwrote it.

In case 5, you have completely filled the character array with characters without leaving space for \0. Yet when you print the string, it prints right. I think it is because the memory adjacent to the memory allocated by malloc() merely happened to be zero which is the value of \0. But this is undefined behavior and should not be relied upon. What really happens depends on the implementation.
It should be noted that malloc() will not initialise the memory that it allocates unlike calloc().

Both

char str[2]='\0';

and

char str[2]=0;

are just the same.

But you cannot rely upon it being zero. Memory allocated dynamically could be having zero as the default value owing to the working of the operating system and for security reasons. See here and here for more about this.

If you need the default value of dynamically allocated memory to be zero, you can use calloc().

Case 6 has the \0 in the end and characters in the other positions. The proper string should be displayed when you print it.

3 Comments

"In case 5...I think it is becuase the memory adjacent to the memory allocated by malloc() was zero" -- arguably could/could not be true. It is because you have invoked Undefined Behavior -- anything can happen, if the next byte in memory just happens to be 0, then it works, but there is no guarantee what will happen -- and malloc does not initialize memory. But all considered, certainly worth a vote, will be better if you fix that statement.
@DavidC.Rankin Thanks for correcting me. I edited it.
This helped immensely in my understanding, thank you so much @J...S! What would happen in case 6 if I didn't explicitly add the line stack6[5] = '\0'; since it's not a string literal would it add the '\0' character? I assume it wouldn't because I'm explicitly adding characters. So if I use malloc and not calloc, would this result in undefined behavior for the terminating character (i.e. rely on the cruft from the previous memory usage)?
0

To be clear, case #2 is undefined behavior (UB).

char stack2[5] = "stack";
printf("WORD %s\n", stack2);

"%s" means "If no l length modifier is present, the argument shall be a pointer to storage of character type. Characters from the storage are written up to (but not including) the terminating null character" or in other words: "%s" needs a matching pointer to a string. stack2 is an array 5 of char that does not contain a null character. printf("WORD %s\n", stack2); converts the array stack2 to a pointer to its first char, yet since that array lacks a null character, the result is undefined behavior.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.