Inplace string replacement in C

Question

Write a function

void inplace(char *str, 
             const char pattern, 
             const char* replacement, 
             size_t mlen)

Input:
str: a string ending with \0. the input indicates that we need an inplace algorithm.

pattern: a letter.

replacement: a string.

mlen: the size of the memory holds the string str starts from the beginning of the memory and that mlen should be larger than strlen(str)

The final result is still pointed by str.

Note that all occurrence of pattern should be replaced.

For example,

helelo\0...........

Here "helelo" is the string to replace with '\0' at the end. After '\0' there are still L valid bytes. We want to replace "e" by "123".

A simple approach works like this, we go through str, when a pattern is matched, we shift all the rest with the place to fill the replacement string, then replace the pattern by the replacement.

If the original string is with length n and contains only e, we need (n-1) + (n-2) + ... + 1 shifts.

Is there an algorithm that scans the string with only one pass and constant memory cost?

"If the original string is with length n and contains only e, we need 2(n-1) + 2(n-2) + ... + 2 shifts". No, that's not correct. Each letter is only shifted once. Example: "abcdef". Shift right one letter means, Copy 'f' one char down, copy 'e' one char down, etc. You are working from the end of the string. It does not mean continuously scanning from the front of the string as you have implied. — kaylum
– kaylum, Commented Sep 1, 2015 at 0:59
If the original string is eeeec, the new string should be 123123123123c. If we know the length of the new string, we can move c directly to the final position, then add 123 in front one by one. W/o knowing the length, when the first e is matched, we shift all the rest eeec 2 bytes right, this needs 4 moves. When we meet the second e, we need another 3 moves.. — Joe C
– Joe C, Commented Sep 1, 2015 at 1:07
Ok, you are right. But that's because your question is still not clearly specified (even though you have started a new question). Well, at least not clear to me that you wanted to replace all occurences (yes, I probably should have assumed that - but that's why requirements should always be clearly and explicitly specified and not leave room for assumptions). — kaylum
– kaylum, Commented Sep 1, 2015 at 1:10

user3386109 · Accepted Answer · 2015-09-01 17:17:09Z

4

I think two passes is the minimum. On the first pass, count the number of characters that will be replaced. Given that count and the length of the replacement string, you can compute the length of the final string. (And you should verify that it's going to fit into the buffer.)

On the second pass, you scan the string backwards (starting at the last character), copying characters to their final location. When you encounter the search character, copy the replacement string to that location.

In your example, the increase in length would be 2. So you would

copy str[5] which is '\0' to str[7]
copy str[4] which is 'o' to str[6]
copy str[3] which is 'l' to str[5]
copy str[2] which is 'l' to str[4]
at str[1] you find the 'e' so str[3]='3' str[2]='2' str[1]='1'

At this point the output index is the same as the input index, so you can break the loop.

As @chux pointed out in the comments, the cases where the replacement string is either empty, or has exactly one character, can be handled with a single forward pass through the string. So the code should handle those cases separately.

edited Sep 1, 2015 at 17:17

answered Sep 1, 2015 at 1:11

user3386109

35k7 gold badges52 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sumit Trehan Over a year ago

I have a question here. You are assuming while scanning the string backwards, we would need to shift the characters by the length of replacement string. However, if consecutive 'e' are present then this would not be the case, then we would need to shift by the consecutive 'e' times the length of replacement string. Since you are scanning backwards, there is no way of knowing that the 'e' are consecutive or not. We are just aware of the count of 'e'.

user3386109 Over a year ago

@SumitTrehan It doesn't matter whether the 'e' are consecutive or not. In the example where the replacement string has length 3, each 'e' adds 2 to the length of the string, since one character 'e' is being replaced by three "123".

Sumit Trehan Over a year ago

I assume you mean to say that we can compute the length of the final string, then we should move the last character to the computed length-1, then carry on the shifting for every other character!

user3386109 Over a year ago

@SumitTrehan Yes, "compute the length of the final string", "copy characters to their final location" is what I said.

chux Over a year ago

Disagree with "two passes is the minimum" as demonstrated with this answer. Unless I was obliged to use a single pass solution, this answer does outline a good approach. Corner weakness: the order of copying depends on the length of the pattern replacement. If strlen(replacement) >= 1 this answer works. But is *replacement == 0, on the second pass, code should scan/replace the string forwards.

|

chux · Accepted Answer · 2015-09-01 02:57:26Z

1

A candidate single pass solution.

For each character in str, recurse. After the recursion, do the replacement.

It does recurse heavily.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>

// return 0:success else 1:fail
static int inplace_help(char *dest, const char *src, int pattern,
        const char* replacement, size_t rlen, size_t mlen) {
  printf("'%p' '%s' %c\n", dest, src, pattern);
  if (*src == pattern) {
    if (rlen > mlen) return 1;
    if (inplace_help(dest + rlen, src + 1, pattern, replacement, rlen,
            mlen - rlen)) return 1;
    memcpy(dest, replacement, rlen);
    return 0;
  }
  if (mlen == 0) return 1;
  int replace1 = *src;
  if (*src) {
    if (inplace_help(dest + 1, src + 1, pattern, replacement, rlen, mlen - 1)) {
      return 1;
    }
  }
  *dest = replace1;
  return 0;
}

void inplace(char *str, const char pattern, const char* replacement,
        size_t mlen) {
  if (pattern == 0) return;
  if (mlen == 0) return;
  if (*replacement == 0) return;  // Insure str does not shrink.
  inplace_help(str, str, pattern, replacement, strlen(replacement), mlen - 1);
}

int main(void) {
  char str[1000] = "eeeeec";
  inplace(str, 'e', "1234", sizeof str);
  printf("'%s'\n", str);  // --> '12341234123412341234c'
  return 0;
}

answered Sep 1, 2015 at 2:57

chux

158k17 gold badges160 silver badges311 bronze badges

7 Comments

Joe C Over a year ago

Interesting design. It uses one pass indeed, although its real memory cost is non constant because of recursion.

Joe C Over a year ago

Actually the strlen called in the very beginning needs one pass to return value. Can we do it without calling strlen?

chux Over a year ago

@Joe C 1) The memory need is proportional to the strlen(str). 2) It is still one-pass. 1 pass through str with inplace_help() and 1 pass though replacement with strlen(replacement). It is not 2-passes though the same string.

chux Over a year ago

@Joe 3) This is an example where 1-pass can be done, yet a 2-pass as suggested by others is likely more practical. It really is not the number of passes that is important, but the order of complexity, O(), of the algorithm in execution and memory usage. I did this to show how 1-pass could occur.

chux Over a year ago

@user3386109 Code like for(i=0;i<n;i++) { foo1()} for(i=0;i<n;i++) { foo2()} is certainly 2-pass. This answer is f(i) { foo3(); if (i) f(i-1); foo4(); } which I assert is single pass as it is O(n) and does not begin a 2nd pass once the 1st one finishes. In any case: real coding goals promote not # of passes, but low order of execution complexity of which your answer and mine are O(n). Also low order of memory usage; which this is O(n) and yours I would deem a superior O(1). Again, my goal here was to demo a 1-pass solution, not a production quality solution.

|

rici · Accepted Answer · 2015-09-01 05:47:33Z

0

The following assumes that the memory allocated to the string has been initialized to something at some point in time, since standard C does not seem to allow access to uninitialized memory. In practice, it will work fine.

It does precisely two scans: the first one is over the entire allocated space, and moves the string to the right-hand edge of the space. The second scan is over the string itself, which it moves back to the left-hand edge while it does replacements.

I changed the prototype to return 0 on success; -1 on failure. I also allow the pattern to be a string. (Maybe a single character was intentional? Easy to change, anyway.) (As written, pattern must not be length zero. Should be checked.)

int inplace(char *str, 
            const char* pattern, 
            const char* replacement, 
            size_t mlen) {
  /* We don't know how long the string is, but we know that it ends
     with a NUL byte, so every time we hit a NUL byte, we reset
     the output pointer.
   */
  char* left = str + mlen;
  char* right = left;
  while (left > str) {
    if (!*--left) right = str + mlen;
    *--right = *left;
  }

  /* Naive left-to-right scan. KMP or BM would be more efficient. */

  size_t patlen = strlen(pattern);
  size_t replen = strlen(replacement);
  for (;;) {
    if (0 == strncmp(pattern, right, patlen)) {
      right += patlen;
      if (right - left < replen) return -1;
      memcpy(left, replacement, replen);
      left += replen;
    } else {
      if (!(*left++ = *right++)) break;
    }
  }
  return 0;
}

answered Sep 1, 2015 at 5:47

rici

243k30 gold badges263 silver badges364 bronze badges

2 Comments

Joe C Over a year ago

I think c only does not allow reading from uninitialized value. It should be fine if we write before read.

rici Over a year ago

@JoeC: That's correct. However, the above code does read before it writes, since it reads the entire allocated region for the given string, and not just the bytes which form part of the string. That would be fine if the region had been originally allocated with calloc, but it might not be fine if the region had been originally allocated with malloc. You would only notice the issue if you ran the code with a tool like valgrind which detects uninitialized reads.

Collectives™ on Stack Overflow

Inplace string replacement in C

3 Answers 3

6 Comments

7 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

7 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related