0

I am reading K&R 2nd Edition and I am having trouble understanding exercise 1-13. The answer is this code

#include <stdio.h>

#define MAXHIST 15  
#define MAXWORD 11  
#define IN 1        
#define OUT 0      


main()
{

    int c, i, nc, state;
    int len;
    int maxvalue;
    int ovflow;
    int wl[MAXWORD];

    state = OUT;
    nc = 0;         
    ovflow = 0;

    for (i = 0; i < MAXWORD; i++)
        wl[i] = 0;  

    while ((c = getchar()) != EOF)
    {
        if(c == ' ' || c == '\n' || c == '\t')
        {
            state = OUT;            
            if (nc > 0)
            {
                if (nc < MAXWORD)   
                    ++wl[nc];       
                else
                    ++ovflow;       
            }                       
            nc = 0;                 
        }
        else if (state == OUT)
        {
            state = IN;             
            nc = 1;                 
        }
        else
            ++nc;                   
    }

    maxvalue = 0;
    for (i = 1; i < MAXWORD; ++i)
    {
        if(wl[i] > maxvalue)
            maxvalue = wl[i];       
    }

    for(i = 1; i < MAXWORD; ++i)
    {
        printf("%5d - %5d : ", i, wl[i]);
        if(wl[i] > 0)
        {
            if((len = wl[i] * MAXHIST / maxvalue) <= 0)
                len = 1;
        }
        else
            len = 0;

        while(len > 0)
        {
            putchar('*');
            --len;
        }
        putchar('\n');
    }

    if (ovflow > 0)
        printf("There are %d words >= %d\n", ovflow, MAXWORD);

    return 0;

}

At the top, wl is being declared and initialized. What I don't understand is why is it looping through it and setting everything to zero if it just counts the length of words? It doesn't keep track of how many words there are, it just keeps track of the word length so why is everything set to 0?

I know this is unclear it's just been stressing me out for the past 20 minutes and I don't know why.

5
  • In the title you ask why initialization is done in an odd way. In the question body you ask why it is initialized at all. Which one is your real question? The answers are different. Commented Dec 6, 2018 at 7:32
  • There's probably no deeper meaning why. They just typed down some crap in 5 minutes and published it. Overall this is quite ugly code and, like the rest of K&R, is not something that should be studied. Commented Dec 6, 2018 at 9:46
  • @Lundin -- agree that K&R is not a good book to learn C from, but OP's code is from a solution manual that follows K&R style and pacing. The exercise is from the introductory chapter before any details have been discussed. Also, it seems that OP is not asking about why int wl[MAXWORD] = { 0 }; is not used, but "why is everything set to 0?" given the mistaken understanding that "it doesn't keep track of how many words there are." Commented Dec 6, 2018 at 16:00
  • @DavidBowling This is however based on the trashy original code from K&R chapter 1.6. Commented Dec 6, 2018 at 16:05
  • @Lundin -- I added some comments to my answer about why code style from the introductory chapter of an ancient book should not be taken to heart. While the code itself is based on K&R 1.6, it is a verbatim copy (minus the comments) of the solution found in Tondo and Gimpel, which was itself published in 1989. Commented Dec 6, 2018 at 16:10

3 Answers 3

3

The ith element of the array wl[] is the number of words of length i that have been found in an input file. The wl[] array needs to be zero-initialized first so that ++wl[nc]; does not cause undefined behavior by attempting to use an uninitialized variable, and so that array elements that represent word lengths that are not present reflect that no such word lengths were found.

Note that ++wl[nc] increments the value wl[nc] when a word of length nc is encountered. If the array were not initialized, the first time the code attempts to increment an array element, it would be attempting to increment an indeterminate value. This attempt would cause undefined behavior.

Further, array indices that represent counts of word lengths that are not found in the input should hold values of zero, but without the zero-initialization, these values would be indeterminate. Even attempting to print these indeterminate values would cause undefined behavior.

The moral: initialize variables to sensible values, or store values in them, before attempting to use them.

It would seem simpler and be more clear to use an array initializer to zero-initialize the wl[] array:

int wl[MAXWORD] = { 0 };

After this, there is no need for the loop that sets the array values to zero (unless the array is used again) for another file. But, the posted code is from The C Answer Book by Tondo and Gimpel. This book provides solutions to the exercises found in the second edition of K&R in the style of K&R, and using only ideas that have been introduced in the book before each exercise. This exercise, 1.13, occurs in "Chapter 1 - A Tutorial Introduction". This is a brief tour of the language lacking many details to be found later in the book. At this point, assignment and arrays have been introduced, but array initializers have not (this has to wait until Chapter 4), and the K&R code that uses arrays has initialized arrays using loops thus far. Don't read too much into code style from the introductory chapter of a book that is 30+ years old.

Much has changed in C since K&R was published, e.g., main() is no longer a valid function signature for the main() function. Note that the function signature must be one of int main(void) or int main(int argc, char *argv[]) (or alternatively int main(int argc, char **argv)), with a caveat for implementation-defined signatures for main().

Sign up to request clarification or add additional context in comments.

7 Comments

"attempting to use an uninitialized variable" (nit "value"); Other than that - bullseye. (both variable and value work -- uninitialized variable -> indeterminate value)
@DavidC.Rankin -- I considered the language "... an uninitialized value", but then thought that values are not initialized, but variables are initialized (or not) and hold values (or indeterminate values).
I guess the awkwardness is that the variable is the array, the uninitialized value is the element (all elements actually). That's the only thing that struck me... all of it being correct but somewhat obtuse.
maybe I looked over this but what I'm asking is why they're looping over MAXWORD to initialize the array instead of just using {0}? If the histogram can display data for over 12 words why is it looping over MAXWORD? Wouldn't that make an array [0,0,0,0,0,0,0,0,0,0,0] even though there can be over 11 words?
@JordanBaron -- as I said in my answer, { 0 } was not used because array initializers have not been introduced in the introductory chapter of K&R. I'm not sure I understand the rest of what you said: MAXWORD is the maximum word length (really MAXWORD-1 is the maximum word length), not the maximum number of words. The array holds a count for each word length up to MAXWORD-1. ovflow keeps a count of words that are longer than MAXWORD-1.
|
1

Everything is set to 0 because if you dont initialize the array, the array will be initialize with random number in it. Random number will cause error in your program. Instead of looping in every position of your array you could do this int wl[MAXWORD] = {0}; at the place of int wl[MAXWORD]; this will put 0 at every position in your array so you dont hava to do the loop.

1 Comment

I think the OP is asking why K&R didn't use the {0} initialization.
-1

I edited your code and put some comments in as I was working through it, to explain what's going on. I also changed some of your histogram calculations because they didn't seem to make sense to me.

Bottom line: It's using a primitive "state machine" to count up the letters in each group of characters that isn't white space. It stores this in wl[] such that wl[i] contains an integer that tells you how many groups of characters (sometimes called "tokens") has a word length of i. Because this is done by incrementing the appropriate element of w[], each element must be initialized to zero. Failing to do so would lead to undefined behavior, but probably would result in nonsensical and absurdly large counts in each element of w[].

Additionally, any token with a length that can't be reflected in w[] will be tallied in the ovflow variable, so at the end there will be an accounting of every token.

#include <stdio.h>

#define MAXHIST 15  
#define MAXWORD 11  
#define IN 1        
#define OUT 0      


int main(void) {
  int c, i, nc, state;
  int len;
  int maxvalue;
  int ovflow;
  int wl[MAXWORD];

  // Initializations
  state = OUT;  //Start off not assuming we're IN a word
  nc = 0;       //Start off with a character count of 0 for current word
  ovflow = 0;   //Start off not assuming any words > MAXWORD length

  // Start off with our counters of words at each length at zero
  for (i = 0; i < MAXWORD; i++) {
    wl[i] = 0;  
  }

  // Main loop to count characters in each 'word'
  // state keeps track of whether we are IN a word or OUTside of one
  // For each character in the input stream...
  //   - If it's whitespace, set our state to being OUTside of a word
  //     and, if we have a character count in nc (meaning we've just left
  //     a word), increment the counter in the wl (word length) array.
  //     For example, if we've just counted five characters, increment
  //     wl[5], to reflect that we now know there is one more word with 
  //     a length of five.  If we've exceeded the maximum word length,
  //     then increment our overflow counter.  Either way, since we're
  //     currently looking at a whitespace character, reset the character
  //     counter so that we can start counting characters with our next
  //     word. 
  //   - If we encounter something other than whitespace, and we were 
  //     until now OUTside of a word, change our state to being IN a word
  //     and start the character counter off at 1.
  //   - If we encounter something other than whitespace, and we are
  //     still in a word (not OUTside of a word), then just increment
  //     the character counter.
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\n' || c == '\t') {
      state = OUT;            
      if (nc > 0) {
        if (nc < MAXWORD) ++wl[nc];
        else ++ovflow;       
      }                       
      nc = 0;                 
    } else if (state == OUT) {
      state = IN;             
      nc = 1;                 
    } else {
      ++nc;
    }
  }

  // Find out which length has the most number of words in it by looping
  // through the word length array. 
  maxvalue = 0;
  for (i = 1; i < MAXWORD; ++i) {
    if(wl[i] > maxvalue) maxvalue = wl[i];       
  }

  // Print out our histogram
  for (i = 1; i < MAXWORD; ++i) {
    // Print the word length - then the number of words with that length
    printf("%5d - %5d : ", i, wl[i]);

    if (wl[i] > 0) {
      len = wl[i] * MAXHIST / maxvalue;
      if (len <= 0) len = 1;
    } else {
      len = 0;
    }

    // This is confusing and unnecessary.  It's integer division, with no
    // negative numbers.  What we want to have happen is that the length
    // of the bar will be 0 if wl[i] is zero; that the bar will have length
    // 1 if the bar is otherwise too small to represent; and that it will be
    // expressed as some fraction of MAXHIST otherwise. 
    //if(wl[i] > 0)
    //    {
    //        if((len = wl[i] * MAXHIST / maxvalue) <= 0)
    //            len = 1;
    //    }
    //    else
    //        len = 0;

    // Multiply MAXHIST (our histogram maximum length) times the relative 
    // fraction, i.e., we're using a histogram bar length of MAXHIST for
    // our statistical mode, and interpolating everything else. 
    len = ((double)wl[i] / maxvalue) * MAXHIST; 

    // Our one special case might be if maxvalue is huge, a word length
    // with just one occurrence might be rounded down to zero.  We can fix
    // that manually instead of using a weird logic structure.
    if ((len == 0) && (wl[i] > 0)) len = 1;

    while (len > 0) {
      putchar('*');
      --len;
    }

    putchar('\n');
  }

  // If any words exceeded the maximum word length, say how many there were.
  if (ovflow > 0) printf("There are %d words >= %d\n", ovflow, MAXWORD);

  return 0;
}

5 Comments

I didn't post images of plain text, I posted a screenshot of a terminal window. The code is in plain text. Additionally, I answered exactly what the question was - indeed, my answer is the only one that does. Don't comment on proposed answers without reading them. Flagged.
The terminal window only contains text that could easily be added as text in your answer.
I admit that you have provided the answer.... Hidden in a single comment amongst loads of other unrelated comments.
@DavidBowling You did a better job than I did answering the first part of the question (the initialization), which is why I upvoted you. However, OP also asked about his understanding that the total number of words wasn't being kept track of (for the histogram). It is, which I addressed in my comments about the main loop and the ovflow variable - though perhaps not clearly enough.
Interesting. Any thoughts on why one would compare an integer assignment with no subtraction and no negative operand using <= 0 ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.