0

In a program I am writing I made a Tokenize struct that says:

TokenizerT *Tokenize(TokenizerT *str) {

    TokenizerT *tok;
    *tok->array = malloc(sizeof(TokenizerT));
    char * arr = malloc(sizeof(50));
    const char *s = str->input_strng;
    int i = 0;

    char *ds = malloc(strlen(s) + 1);
    strcpy(ds, s);

    *tok->array[i] = strtok(ds, " ");

    while(*tok->array[i]) {
        *tok->array[++i] = strtok(NULL, " ");
    }
    free(ds);
    return tok;
}

where TokenizeT is defined as:

struct TokenizerT_ {
    char * input_strng;
    int count;
    char **array[];
};

So what I am trying to do is create smaller tokens out of a large token that I already created. I had issues returning an array so I made array part of the TokenizerT struct so I can access it by doing tok->array. I am getting no errors when I build the program, but when I try to print the tokens I get issues.

TokenizerT *ans;
TokenizerT *a = Tokenize(tkstr);
char ** ab = a->array;
ans = TKCreate(ab[0]);
printf("%s", ans->input_strng);

TKCreate works because I use it to print argv but when i try to print ab it does not work. I figured it would be like argv so work as well. If someone can help me it would be greatl appreciated. Thank you.

4
  • 1
    you don't allocate for tok and tok->array Commented Jul 2, 2015 at 22:19
  • We a more complete example to help you. Commented Jul 3, 2015 at 1:10
  • I'm curious what you believe char * arr = malloc(sizeof(50)); does. I suspect you meant malloc(50); as it happens, sizeof(50), sizeof(42) and sizeof(32767) all evaluate to the same number (4 on my system, but it can be different). sizeof is not part of the magic of malloc; it is a compile-time expression which produces the size in bytes required for a type, where the type is either named or, as in your code, represented by an (unevaluated) expression that type. So 50 is standing in for the type int. Commented Jul 3, 2015 at 3:50
  • I know this is what you meant but still,.. it is char * array, or a char **.. if you are trying make something like the argv that is normally used in main( ). Commented Jul 3, 2015 at 6:58

1 Answer 1

3

Creating the Tokenizer

I'm going to go out on a limb, and guess that the intent of:

TokenizerT *tok;
*tok->array = malloc(sizeof(TokenizerT));
char * arr = malloc(sizeof(50));

was to dynamically allocate a single TokenizerT with the capacity to contain 49 strings and a NULL endmarker. arr is not used anywhere in the code, and tok is never given a value; it seems to make more sense if the values are each shifted one statement up, and corrected:

// Note: I use 'sizeof *tok' instead of naming the type because that's
//       my style; it allows me to easily change the type of the variable
//       being assigned to. I leave out the parentheses because
//       that makes sure that I don't provide a type.
//       Not everyone likes this convention, but it has worked pretty
//       well for me over the years. If you prefer, you could just as
//       well use sizeof(TokenizerT).
TokenizerT *tok = malloc(sizeof *tok);
// (See the third section of the answer for why this is not *tok->array)
tok->array = malloc(50 * sizeof *tok->array);

(tok->array is not a great name. I would have used tok->argv since you are apparently trying to produce an argument vector, and that's the conventional name for one. In that case, tok->count would probably be tok->argc, but I don't know what your intention for that member is since you never use it.)

Filling in the argument vector

strtok will overwrite (some) bytes in the character string it is given, so it is entirely correct to create a copy (here ds), and your code to do so is correct. But note that all of the pointers returned by strtok are pointers to character in the copy. So when you call free(ds), you free the storage occupied by all of those tokens, which means that your new freshly-created TokenizerT, which you are just about to return to an unsuspecting caller, is full of dangling pointers. So that will never do; you need to avoid freeing those strings until the argument vector is no longer needed.

But that leads to another problem: how will the string be freed? You don't save the value of ds, and it is possible that the first token returned by strtok does not start at the beginning of ds. (That will happen if the first character in the string is a space character.) And if you don't have a pointer to the very beginning of the allocated storage, you cannot free the storage.

The TokenizerT struct

char is a character (usually a byte). char* is a pointer to a character, which is usually (but not necessarily) a pointer to the beginning of a NUL-terminated string. char** is a pointer to a character pointer, which is usually (but not necessarily) the first character pointer in an array of character pointers.

So what is char** array[]? (Note the trailing []). "Obviously", it's an array of unspecified length of char**. Because the length of the array is not specified, it is an "incomplete type". Using an incomplete array type as the last element in a struct is allowed by modern C, but it requires you to know what you're doing. If you use sizeof(TokenizerT), you'll end up with the size of the struct without the incomplete type; that is, as though the size of the array had been 0 (although that's technically illegal).

At any rate, that wasn't what you wanted. What you wanted was a simple char**, which is the type of an argument vector. (It's not the same as char*[] but both of those pointers can be indexed by an integer i to return the ith string in the vector, so it's probably good enough.)


That's not all that's wrong with this code, but it's a good start at fixing it. Good luck.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.