Removing a substring from a string in C

Question

I already have the code that removes a substring from a string (word) in C, but I don't understand it. Can someone explain it to me? It doesn't use functions from the standard library. I tried to analyze it myself, but certain parts I still don't understand - I put them in the comments. I just need to figure out how does this all work.

Thanks!

#include <stdio.h>
#include <stdlib.h>
void remove(char *s1, char *s2);

int main()
{
   char s1[101], s2[101];
   printf("First word: ");
   scanf("%s", s1);
   printf("Second word: ");
   scanf("%s", s2);
   remove(s1, s2);
   printf("The first word after removing is '%s'.", s1);

   return 0;
}
void remove(char *s1, char *s2)
{
   int i = 0, j, k;
   while (s1[i])       // ITERATES THROUGH THE FIRST STRING s1?
   {
       for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);   // WHAT DOES THIS LINE DO?
          if (!s2[j])           // IF WE'RE AT THE END OF STRING s2? 
             {
                 for (k = i; s1[k + j]; k++)   //WHAT DOES THIS ENTIRE BLOCK DO?
                    s1[k] = s1[k + j];
                    s1[k] = 0;
              }
          else
              i++;    // ???
    }
}

the for loop with j check if the word s2 is into s1 at position i. if the word is found the for loop with k shift all the letter till the end of s1 to remove s2. This function will remove all word s2 into s1 — Ôrel
– Ôrel, Commented Jan 11, 2016 at 12:55
Whenever you are faced with a bit of pointer code you do not understand, take out a piece of paper and a pencil, write the contents of the array or string out on the paper and then write 0123456... out under each character. Put a mark at the index currently pointed to by the pointer (usually the beginning), and then work through the code, moving the mark each time the pointer changes until you are comfortable with what is being done. No magic. — David C. Rankin
– David C. Rankin, Commented Jan 11, 2016 at 13:30
I agree wholeheartedly with @DavidC.Rankin, this is the kind of algorithm that more words won't necessarily help explain if you don't get it immediately. Someone kind enough might draw some diagrams or animations and post them here, but I wouldn't count on it (thus why taking out a piece of paper or a diagram tool you're proficient with and do it yourself is likely your best bet). — tne
– tne, Commented Jan 11, 2016 at 14:06
... and stepping through it with a debugger, using a short sample string and a watch on each of these variables. — Jongware
– Jongware, Commented Jan 11, 2016 at 14:13

Kamaldeep singh Bhatia · Accepted Answer · 2016-01-11 13:02:59Z

2

Here main working of function is like :

-Skip the common part between both strings and assign the first string with new string.

while (s1[i])       // Yes It ITERATES THROUGH THE FIRST STRING s1
       {
           for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);   // Here it skips the part which is 
//similar in both

As this loop just increasing the index of common part so this will skip storing of data in s1.

if (!s2[j])           // IF WE'RE AT THE END OF STRING s2
{
 for (k = i; s1[k + j]; k++)   //Here it is re assigning the non common part.
 s1[k] = s1[k + j];
 s1[k] = 0;
}
else
 i++;    // it is req. if both have more values.
}

answered Jan 11, 2016 at 13:02

Kamaldeep singh Bhatia

7326 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

phoenix · Accepted Answer · 2016-01-11 13:03:13Z

The first while (s1[i]) iterates through s1. Yes, you are right.

for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);

The above for loop checks whether the substring s2 is present in s1 starting from s1[i]. If it matches, s2 is completely iterated. If not, at the end of the for loop, s2[j] will not be null character. Example: if s1 = ITERATE and s2 = RAT, then the loop will execute completely only when i=3.
so the if (!s2[j]) holds then it means we have found a substring and i is the starting point of the substring in s1.

         for (k = i; s1[k + j]; k++)   //WHAT DOES THIS ENTIRE BLOCK DO?
            s1[k] = s1[k + j];
            s1[k] = 0;

The abov block removes the substring. So, for the ITERATE and RAT example, this is done by copying E and null char at positions where R and A were present. The for loop achieves this. If s2[j] is not null after for loop, the i is incremented to check for substribng from the next position of s1.

kazbeel · Accepted Answer · 2016-01-11 13:08:56Z

Here is an approach of the functionality condensed in the comments

void remove(char *s1, char *s2)
{
   int i = 0, j, k;
   while (s1[i])       // Iterates through s1 (until it finds a zero)
   {
       for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);   // Iterates through s2 while both it is NOT the end of the string s2 and each character of s2 coincides with s1 (if s2 == s1, j points to the end of s2 => zero)
          if (!s2[j])           // If j point to the end of s2 => We've found the coincidence
             {
                 for (k = i; s1[k + j]; k++)   //Remove the coincident substring
                    s1[k] = s1[k + j];
                    s1[k] = 0;
              }
          else
              i++;    // There is no coincidence so we continue to the next character of s1
    }
}

Note: I also hace noticed that this may be easily exploted since it iterates out of s1 range.

Daniel Underwood · Accepted Answer · 2016-01-11 13:21:42Z

Let's break it down. You have

while (s1[i])
{
    // Code
}

This iterates through s1. Once you get to the end of the string, you have \0, which is the null terminator. When evaluated in a condition, it will evaluate to 0. It may have been better to use a for here.

You then have

for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);

This does nothing but increment j. It should be noted that this expression does not have braces and it terminated with a semicolon, so the code after it shouldn't be executed within the loop body. If it did have the braces correctly, it would loop over the following if/else while s2 was not null and s2[j] == s1[i+j]. I don't really have an explanation for the second part other than the character in s2 is offset by an amount i in s1. This part could likely be improved to remove unnecessary iterations.

Then there's

if (!s2[j])
{
}
else
{
}

This checks to make sure the position in s2 is valid and executes the removal of the string if so and otherwise increments i. It could be improved by returning in the else when s2 could no longer fit in the remainder of s1.

for (k = i; s1[k + j]; k++)
    s1[k] = s1[k + j];
    s1[k] = 0;

This is another somewhat strange loop since due to the absence of braces, s1[k] = 0 will be set outside of the loop. What happens here is that the string is compacted down by removing s2 and shifting the character at k+j down to k. At the end of the loop s1[k] = 0 ends the string in a null terminator to be properly ended.

If you want a deeper understanding, it may be worth trying to write your own code to do the same thing and then comparing afterwards. I have found that that generally helps more than reading a bunch of tests.

Collectives™ on Stack Overflow

Removing a substring from a string in C

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related