2

I am trying to find the most efficient way to find tags in a given char array. These "tags" are a sequence of chars located randomly within a char array.

Here is an example: given a char array: {'a','s','s','1','m','s','g','e','x','x','r','s','1',...}. the tag "ss1" indicates the beginning of a message which contains every char until a sequence of "exx" is found, which a tag for the end of the message, and it keeps searching the array for the next sequence of "s1". In this example, the message here is "msg".

my initial design was (pseudo code)

while(array[i] != '\0')
    if(array[i] == 's' && array[i+1] == 's' && array[i+2] == '1'  )
        int j = i+3;
            if(array[j] != '\0' && array[j] == 'e' && array[j+1] == 'x' && array[j+2] == 'x' )
                i += 3;
            else
                print(array[j]);
    else i++; //next char

may be a little flawed, but you get the idea. Is there a better way? i thought about strstr but since I'm dealing with a char array here and still looping even after deciphering a message, I thought it might be difficult to implement.

18
  • can tags be nested ? something like this is acceptable ? <ss1>msg_start_.....<ss1>innerloopmessage...<exx>msg_eng_....<exx> Commented May 28, 2016 at 23:54
  • no @RaviSankarRaju Commented May 28, 2016 at 23:54
  • Well for starters, you're indexing past the end of the array. What if there's a single character left? So array[i] is 'x' for instance, and array[i+1] is '\0'. So you check to see if array[i] is '\0', which it isn't. You then proceed to examine three entries, when you only know that one is available. That is probably an access violation. It will probably work, but it is not guaranteed, and it is very sloppy coding. Never, ever try to read past the end of an array. Even worse, you then go even further with j. Commented May 28, 2016 at 23:55
  • thanks for the catch @TomKarzes I intend to safeguard against those kind of access, I just wanted to show what i was thinking of designing and am looking for a better solution Commented May 28, 2016 at 23:57
  • 2
    Is there a better way? Construct a state machine. Commented May 29, 2016 at 10:28

1 Answer 1

1

Try to maintain a state denoting how much of the tag start and end you have found. Something like this: (This code will work even if the message within the tag is of arbitrary length)

int state = 0;
int found = 0;
int i = 0,j;
int msgStartIndex;
int msgEndIndex;
while(array[i]){
    if((array[i] == 's' && state == 0) || (array[i] == 's' && state == 1) || (array[i] == '1' && state == 2) ){
        state++;
        if(!found && state == 3){
            msgStartIndex = i+1;
            found = 1;
        }
    }
    else if(!found && (array[i] = 's' && state == 2))
        state = 2;
    else if(!found)
        state = 0;
    if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == 'x' && state == 1) ){
        state--;
        if(found && state == 0){
            found = 0;
            msgEndIndex = i-3;
            for(j=msgStartIndex; j < msgEndIndex+1; j++)
                printf("%c",array[j]);
            printf("\n");
        }
    }
    else if(found && (array[i] == 'e') && (state == 2 || state == 1))
        state = 2;
    else if(found)
        state = 3;
    i++;
}

Updated answer for start tag st1 and end tag ex1

int state = 0;
int found = 0;
int i=0,j;
int msgStartIndex;
int msgEndIndex;
while(array[i]){
    if((array[i] == 's' && state == 0) || (array[i] == 't' && state == 1) || (array[i] == '1' && state == 2) ){
        state++;
        if(!found && state == 3){
            msgStartIndex = i+1;
            found = 1;
        }
    }
    else if(!found && (array[i] = 's' && (state == 1 || state == 2)))
        state = 1;
    else if(!found)
        state = 0;
    if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == '1' && state == 1) ){
        state--;
        if(found && state == 0){
            found = 0;
            msgEndIndex = i-3;
            for(j=msgStartIndex; j < msgEndIndex+1; j++)
                printf("%c",array[j]);
            printf("\n");
        }
    }
    else if(found && (array[i] == 'e') && (state == 2 || state == 1))
        state = 2;
    else if(found)
        state = 3;
    i++;
Sign up to request clarification or add additional context in comments.

5 Comments

what happens when the closing tag is ex1? how does that affect the code?
@i_use_the_internet just change if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == 'x' && state == 1) ) to if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == '1' && state == 1) )
it causes a segmentation fault for array {'a','s','t','1','m','s','g','e','x','1','r','s','t','1','s','t','1','s','s','g','e','x','1','z'}; starting tag st1 ending tag ex1
I am out on the weekend. Will get back to you on maonday IST
@i_use_the_internet Sorry, I forgot to initialize i, hence the segmentation fault. Try it now. Also it seems like in the example the start tag is st1 instead of ss1. You will have to change the logic if you want start tag as st1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.