0

I am attempting to write a very basic lexxer in C and have the following code which is supposed to just do something like the following:

Input: "12 142 123"

Output:

NUMBER -- 12
NUMBER -- 14
NUMBER -- 123

However, I am having an issue where if I do not include an initial printf("") statement before looping over the input, then I will get an output like this: Output:

NUMBER --
NUMBER -- 14
NUMBER -- 123

where the first number is simply blank. I am really confused as to why this is happening and would really appreciate some help with this!

I have the following code (with a number of irrelevant functions omitted)

#define MAX_LEN 400

char* input;
char* ptr;

char curr_type;
char curr;

enum token_type {
  END,
  NUMBER,
  UNEXPECTED
};

typedef struct {
  enum token_type type;
  char* str;
} Token;
  
void print_tok(Token t) {
  printf("%s -- %s\n", token_types[t.type], t.str);
}

char get(void) {
  return *ptr++;
}

char peek(void) {
  return *ptr;
}

Token number(void) {
  char arr[MAX_LEN];
  arr[0] = peek();
  get();
  int i = 1;
  while (is_digit(peek())) {
    arr[i] = get();
    ++i;
  }
  arr[++i] = '\0';
  Token ret = {NUMBER, (char*)arr};
  return ret;
}

Token unexpected(void) {
  // omitted
}

Token next(void) {
  while (is_space(peek())) get();

  char c = peek();
  switch (peek()) {
    case '0':
    // omitted
    case '9':
      return number();
    default: 
      return unexpected();
  }
}

int main(int argc, char **argv) {
  printf(""); // works fine with this line

  input = argv[1];
  ptr = input;

  Token tokens[MAX_LEN];
  Token t;
  int i = 0;
  do {
    t = next();
    print_tok(t);
    
    tokens[i++] = t;

  } while (t.type != END && t.type != UNEXPECTED);

  return 0;
}

2
  • Token ret = {NUMBER, (char*)arr}; You are using a pointer to a local variable there. You need to malloc space for the string + term and copy. Commented Sep 24, 2020 at 19:52
  • 1
    What you describe has undefined behavior writen all over it. Commented Sep 24, 2020 at 19:53

1 Answer 1

2

In number, arr is a local variable. The local variable is destroyed when its function ends and its content is then unpredictable. Nonetheless, your program then prints its value by using a pointer in the Token struct.

The value that is printed is unpredictable. The extra printf("") statement may cause the compiler to rearrange the code in a way that causes the variable to not get overwritten, or something like that. You cannot rely on it.

You have several other options to allocate memory per token:

  • Change str in token so it's an array of chars instead of a pointer. Then each token has its own space to store the string.
  • Allocate the string with malloc. Then it stays allocated until you free it.
  • Create the array in main so it's valid for both next and print_tok. You'd have to give next a pointer to the array, so it knows where it should store the string. This would only store one token's string at a time.
  • Basically any other way of creating an array other than making it a local variable in next.
  • Make the pointer point to where the token is in the original string. Add another variable in Token which stores how long the token is.

I think the first option is easiest and the last option uses the least memory, but I included some other options for completeness.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, this makes sense. Would the solution be to allocate non-local memory for the struct and then return a pointer? Or is there another preferred method in this type of situation?
@mlz7 you can: (1) 'malloc' the char array (and free it somewhere else), (2) declare the arry locally in main (or as a global variable...) and pass it as a parameter to number, (3) maybe the fastest solution is just use the static storage in number so that the array lifetime will be the one of the whole program
@mlz7 the first solution is more "general" as it applies to multithread environment too. Anyway if your application is single threaded solutions 2 and 3 are ok being the third the fastest one.
@mlz7 you're welcome. Ps: this answer would be great and would gain my UV if it contained also some suggestion on how to fix the issue (the code section seems identical to the OP's code, instead). You can adapt my comments if you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.