0

I am trying to split a char array in C using strtok. I have this working at the moment, but i have now realised that when there is two consecutive delimiters the who concept gets offset.

I am parsing the char array into a structure (i cannot post the exact code because it is for an assignment, but i will post similar code with assignment specifics changed) based on thier index, so e.g.

struct test_struct{

     int index_1;
     int index_2;
     int index_3;
     int index_4;
     int index_5;

}test_struct;

I use a counter to populate this information, so every time a delimiter is reached increment this counter and assign data to this index, e.g:

char c_array[50] = "hello,this,is,an,example"

counter = 0;

token = strtok (c_array,",");

while (token != NULL) {
    switch(counter){
                case 0:
                test_struct.index_1 = token;
                break;
                case 1:
                test_struct.index_2 = token;
                break;

              //repeat this step for the other indexes

 }
  counter++;
  token = strtok (NULL, ",");

}

I know case switch is probably a poor design choice in this situation, but aside from that can somebody help me find a solution to this problem:

The problem is, when a char array (C string basically) contains consecutive delimiters, then the token "skips" this index, thus throwing everything out of line. take the above example

if the char array is formatted properly, then when case 5 hits, it will have representing the 5th "spit string" so for the above example, when counter == 5 test_struct.index_5 will have the value "example".

Now, if given the above code if the c_array[50] = "hello,this,,an,example" then the problem would be that after there is missing data now in the array so this messes up the indexing, it will "skip" the next index because ,, doesn't have any "string" inbetween them so instead of the intended behaviour i get this:

test_struct.index_1 = "hello"
test_struct.index_2 = "this"
test_struct.index_3 = "an"
test_struct.index_4 = "example"
test_struct.index_5 = "example"

So is there a way to say if there is a "" then set the token to a default value, e.g. "missing data" so at least then i can handle that separately after i have read in my data to the correct indexes.

I hope you understand what i mean.

Cheers, Chris.

6
  • You say: I am trying to split a char array in C using strtok. This is where your problems start. Look at the specification of strtok(), then repent of your ways. It is not the right tool to be using if you care about empty tokens. Also, doesn't the use of the names index_1index_5 scream 'array' (as in array!, or even ¡array!) at you? It should! Commented Mar 10, 2014 at 1:38
  • these variables names are just abstract examples from my code, when i copied my code in i changed them, because this is for an assignment so i don;t get in trouble for copying. Could you recommend which tool(s) i should research in order to handle empty tokens in this way, i haven;t done much C. Commented Mar 10, 2014 at 1:49
  • Look up the functions strspn(), strcspn(), strpbrk(). You can use one of the latter two, most simply, in conjunction with … oh, that's interesting; I've just spotted that you are trying to assign pointers to integers, which is not a good idea either. OK; you can use strcspn() or strpbrk() to find the next delimiter, and then arrange to do whatever is appropriate. Commented Mar 10, 2014 at 1:53
  • i see ill have a look into those tomorrow, i come from 95% Java background, so i just use String.split(regex) there, this has made me think about what actually goes on and how to tackle such problems, thanks for your help. if i struggle using these, you me again soon! Commented Mar 10, 2014 at 1:57
  • use strsep Commented Mar 10, 2014 at 2:08

1 Answer 1

2

Working code

NB: this code still modifies the input string, but recognizes empty tokens quite happily.

#include <stdio.h>
#include <string.h>

static void split(char *string)
{
    enum { MAX_STRINGS = 5 };
    struct test_struct
    {
        char *index[MAX_STRINGS];
    } test_struct;

    printf("Splitting: [%s]\n", string);

    int i = 0;
    char *bgn = string;
    char *end;
    while (i < MAX_STRINGS && (end = strpbrk(bgn, ",")) != 0)
    {
        test_struct.index[i++] = bgn;
        *end = '\0';
        bgn = end + 1;
    }
    if (i >= MAX_STRINGS)
        fprintf(stderr, "Too many strings!\n");
    else
        test_struct.index[i++] = bgn;

    for (int j = 0; j < i; j++)
        printf("index[%d] = [%s]\n", j, test_struct.index[j]);
}

int main(void)
{
    char c_array[][30] =
    {
        "hello,this,is,an,example",
        "hello,this,,an,example",
        "hello,,bad,,example,input",
        "hello,world",
        ",,,,",
        ",,",
        "",
    };
    enum { C_SIZE = sizeof(c_array) / sizeof(c_array[0]) };
    for (int i = 0; i < C_SIZE; i++)
        split(c_array[i]);
    return 0;
}

Example output

Splitting: [hello,this,is,an,example]
index[0] = [hello]
index[1] = [this]
index[2] = [is]
index[3] = [an]
index[4] = [example]
Splitting: [hello,this,,an,example]
index[0] = [hello]
index[1] = [this]
index[2] = []
index[3] = [an]
index[4] = [example]
Splitting: [hello,,bad,,example,input]
Too many strings!
index[0] = [hello]
index[1] = []
index[2] = [bad]
index[3] = []
index[4] = [example]
Splitting: [hello,world]
index[0] = [hello]
index[1] = [world]
Splitting: [,,,,]
index[0] = []
index[1] = []
index[2] = []
index[3] = []
index[4] = []
Splitting: [,,]
index[0] = []
index[1] = []
index[2] = []
Splitting: []
index[0] = []
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.