0

I have written the following function in C to try to tokenize a string. The function takes in a string to be tokenized (char * string), as well as a string of delimiting characters used to separate tokens from one another (char * delimiters).

char ** tokenize(char * string, char * delimiters)
{
    int num_of_tokens = 0;
    int itr = 0;

    char ** tokens = NULL;

    while (string[itr] != '\0')
    {
        if (!isDelimiter(string[itr], delimiters))
        {
            num_of_tokens++; /*if char is not a delimiter, we have found a new token*/

            int temp_token_count = num_of_tokens - 1;

            tokens = realloc(tokens, num_of_tokens);
            tokens[temp_token_count] = malloc(STRING_SIZE * sizeof(char));

            while(!isDelimiter(string[itr], delimiters) && string[itr] != '\0')
            {
                appendChar(tokens[temp_token_count], string[itr]);
                itr++;
            }
        }

        itr++;
    }
    return tokens;
}

From the main function, the call to the tokenize function looks like this:

int main()
{
    char * string = "This would,,,,be";
    char * delim = ",.:;*& ";

    char ** tokens = tokenize(string, delim);

    int x = 0;

    while(x<3)
    {
        printf("%s\n", tokens[x]);
        x++;
    }

    return 0;
}

I would expect the output from this call to produce:

This
would
be

However, this is what is being output:

 L@?
would
be

This seems especially odd considering if I call the tokenize function with "This," as the input string, I receive back exactly what I would expect too:

This

I can't figure out whats going on and any help would be greatly appreciated, thanks for your time!!

Edit: This is the isDelimiter function

int isDelimiter(char test_char, char * delimiters)
{
    int itr = 0;

    while (delimiters[itr] != '\0')
    {
        if (test_char == delimiters[itr]) return 1;
        itr++;
    } 

    return 0;
}
8
  • show your function isDelimiter . Commented Jul 19, 2015 at 17:31
  • tokens = realloc(tokens, num_of_tokens); should be tokens = realloc(tokens, num_of_tokens*sizeof(char*)); Commented Jul 19, 2015 at 17:32
  • regarding calling malloc, calloc, and realloc: 1) always check (!=NULL) the returned value to assure the operation was successful. 2) when calling realloc, do not place the returned value directly into the target pointer. Because, if the realloc fails, then the original pointer is lost (overlayed with NULL) resulting in a memory leak. I.E. always save the returned pointer in a void *, check for NULL, then if not NULL, save to target pointer. 3) when calling malloc, the parameter never needs 'sizeof(char)' as that is always 1 so has no effect and clutters the code Commented Jul 19, 2015 at 18:51
  • regarding this line: 'tokens = realloc(tokens, num_of_tokens);' the number of bytes to allocate is actually the 'num_of_tokens times sizeof (char *)' So the posted code is not allocating enough memory Commented Jul 19, 2015 at 18:54
  • the variable 'itr' is being increment once too often, so the test at the top of the loop misses checking a character that is either the end of the string or a delimiter. Strongly suggest using a debugger, like gdb, to step through the code, The tokenize() function might be better written as a two state machine. 1) not in a token 2) in a token. then the logic would be clearer and easier to catch problems. Commented Jul 19, 2015 at 19:00

1 Answer 1

2

This is incorrect:

tokens = realloc(tokens, num_of_tokens);

Since tokens is being used as an array of pointers, you need to allocate space for num_of_tokens pointers:

tokens = realloc(tokens, num_of_tokens * sizeof(char *));

Also, when you find a token, you iterate through the string in another while loop until you find a delimiter or a NULL. That's fine, however you then increment itr again at the bottom of the outer while loop. If you found NULL at the end of the inner loop, this increment will move the inde outside of the range of the string, resulting in unspecified behavior.

You should only increment in the outer loop if you don't find a delimiter:

while (string[itr] != '\0')
{
    if (!isDelimiter(string[itr], delimiters))
    {
        ...
    }
    else
    {
        itr++
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the answer and help, everything is working perfectly now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.