2

I have a bytes.Buffer type variable which I filled with Unicode characters:

var mbuff bytes.Buffer
unicodeSource := 'کیا حال ھے؟'    
for i,r := range(unicodeSource) {
    mbuff.WriteRune(r)
}

Note: I iterated over a Unicode literals here, but really the source is an infinite loop of user input characters.

Now, I want to remove a Unicode character from any position in the buffer mbuff. The problem is that characters may be of variable byte sizes. So I cannot just pick out the ith byte from mbuff.String() as it might be the beginning, middle, or end of a character. This is my trivial (and horrendous) solution:

// removing Unicode character at position n
var tempString string
currChar := 0
for _, ch := range(mbuff.String()) { // iterate over Unicode chars
    if currChar != n {               // skip concatenating nth char
        tempString += ch
    }
    currChar++
}
mbuff.Reset()                        // empty buffer
mbuff.WriteString(tempString)        // write new string

This is bad in many ways. For one, I convert buffer to string, remove ith element, and write a new string back into the buffer. Too many operations. Second, I use the += operator in the loop to concatenate Unicode characters into a new string. I am using buffers in the first place exactly to avoid concatenation using += which is slow as this answer points out.

What is an efficient method to remove the ith Unicode character in a bytes.Buffer?
Also what is an efficient way to insert a Unicode character after i-1 Unicode characters (i.e. in the ith place)?

0

3 Answers 3

3

To remove the ith rune from a slice of bytes, loop through the slice counting runes. When the ith rune is found, copy the bytes following the rune down to the position of the ith rune:

func removeAtBytes(p []byte, i int) []byte {
    j := 0
    k := 0
    for k < len(p) {
        _, n := utf8.DecodeRune(p[k:])
        if i == j {
            p = p[:k+copy(p[k:], p[k+n:])]
        }
        j++
        k += n
    }
    return p
}

This function modifies the backing array of the argument slice, but it does not allocate memory.

Use this function to remove a rune from a bytes.Buffer.

p := removeAtBytes(mbuf.Bytes(), i)
mbuf.Truncate(len(p)) // backing bytes were updated, adjust length

playground example

To remove the ith rune from a string, loop through the string counting runes. When the ith rune is found, create a string by concatenating the segment of the string before the rune with the segment of the string after the rune.

func removeAt(s string, i int) string {
    j := 0  // count of runes
    k := 0  // index in string of current rune
   for k < len(s) {
        _, n := utf8.DecodeRuneInString(s[k:])
        if i == j {
            return s[:k] + s[k+n:]
        }
        j++
        k += n
    }
    return s
}

This function allocates a single string, the result. DecodeRuneInString is a function in the standard library unicode/utf8 package.

Sign up to request clarification or add additional context in comments.

1 Comment

Accepted because it explicitly shows how to work with buffers. Amd's answer is also worth looking at as it explores variations of my problem.
0

Taking a step back, go often works on Readers and Writers, so an alternative solution would be to use the text/transform package. You create a Transformer, attach it to a Reader and use the new Reader to produce a transformed string. For example here's a skipper:

func main() {
    src := strings.NewReader("کیا حال ھے؟")
    skipped := transform.NewReader(src, NewSkipper(5))
    var buf bytes.Buffer
    io.Copy(&buf, skipped)
    fmt.Println("RESULT:", buf.String())
}

And here's the implementation:

package main

import (
    "bytes"
    "fmt"
    "io"
    "strings"
    "unicode/utf8"

    "golang.org/x/text/transform"
)

type skipper struct {
    pos int
    cnt int
}

// NewSkipper creates a text transformer which will remove the rune at pos
func NewSkipper(pos int) transform.Transformer {
    return &skipper{pos: pos}
}

func (s *skipper) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
    for utf8.FullRune(src) {
        _, sz := utf8.DecodeRune(src)
        // not enough space in the dst
        if len(dst) < sz {
            return nDst, nSrc, transform.ErrShortDst
        }
        if s.pos != s.cnt {
            copy(dst[:sz], src[:sz])
            // track that we stored in dst
            dst = dst[sz:]
            nDst += sz
        }
        // track that we read from src
        src = src[sz:]
        nSrc += sz
        // on to the next rune
        s.cnt++
    }
    if len(src) > 0 && !atEOF {
        return nDst, nSrc, transform.ErrShortSrc
    }
    return nDst, nSrc, nil
}

func (s *skipper) Reset() {
    s.cnt = 0
}

There may be bugs with this code, but hopefully you can see the idea.

The benefit of this approach is it could work on a potentially infinite amount of data without having to store all of it in memory. For example you could transform a file this way.

Comments

0

Edit:

Remove the ith rune in the buffer:
A: Shift all runes one location to the left (Here A is faster than B), try it on The Go Playground:

func removeRuneAt(s string, runePosition int) string {
    if runePosition < 0 {
        return s
    }
    r := []rune(s)
    if runePosition >= len(r) {
        return s
    }
    copy(r[runePosition:], r[runePosition+1:])
    return string(r[:len(r)-1])
}

B: Copy to new buffer, try it on The Go Playground

func removeRuneAt(s string, runePosition int) string {
    if runePosition < 0 {
        return s // avoid allocation
    }
    r := []rune(s)
    if runePosition >= len(r) {
        return s // avoid allocation
    }
    t := make([]rune, len(r)-1) // Apply replacements to buffer.
    w := copy(t, r[:runePosition])
    w += copy(t[w:], r[runePosition+1:])
    return string(t[:w])
}

C: Try it on The Go Playground:

package main

import (
    "bytes"
    "fmt"
)

func main() {
    str := "hello"
    fmt.Println(str)
    fmt.Println(removeRuneAt(str, 1))

    buf := bytes.NewBuffer([]byte(str))
    fmt.Println(buf.Bytes())

    buf = bytes.NewBuffer([]byte(removeRuneAt(buf.String(), 1)))
    fmt.Println(buf.Bytes())
}
func removeRuneAt(s string, runePosition int) string {
    if runePosition < 0 {
        return s // avoid allocation
    }
    r := []rune(s)
    if runePosition >= len(r) {
        return s // avoid allocation
    }

    t := make([]rune, len(r)-1) // Apply replacements to buffer.
    w := copy(t, r[0:runePosition])
    w += copy(t[w:], r[runePosition+1:])
    return string(t[0:w])
}

D: Benchmark:
A: 745.0426ms
B: 1.0160581s
for 2000000 iterations


1- Short Answer: to replace all (n) instances of a character (or even a string):

n := -1
newR := ""
old := "µ"
buf = bytes.NewBuffer([]byte(strings.Replace(buf.String(), old, newR, n)))

2- For replacing the character(string) in the ith instance in the buffer, you may use:

buf = bytes.NewBuffer([]byte(Replace(buf.String(), oldString, newOrEmptyString, ith)))

See:

// Replace returns a copy of the string s with the ith
// non-overlapping instance of old replaced by new.
func Replace(s, old, new string, ith int) string {
    if len(old) == 0 || old == new || ith < 0 {
        return s // avoid allocation
    }
    i, j := 0, 0
    for ; ith >= 0; ith-- {
        j = strings.Index(s[i:], old)
        if j < 0 {
            return s // avoid allocation
        }
        j += i
        i = j + len(old)
    }
    t := make([]byte, len(s)+(len(new)-len(old))) // Apply replacements to buffer.
    w := copy(t, s[0:j])
    w += copy(t[w:], new)
    w += copy(t[w:], s[j+len(old):])
    return string(t[0:w])
}

Try it on The Go Playground:

package main

import (
    "bytes"
    "fmt"
    "strings"
)

func main() {
    str := `How are you?µ`
    fmt.Println(str)
    fmt.Println(Replace(str, "µ", "", 0))

    buf := bytes.NewBuffer([]byte(str))
    fmt.Println(buf.Bytes())

    buf = bytes.NewBuffer([]byte(Replace(buf.String(), "µ", "", 0)))

    fmt.Println(buf.Bytes())
}
func Replace(s, old, new string, ith int) string {
    if len(old) == 0 || old == new || ith < 0 {
        return s // avoid allocation
    }
    i, j := 0, 0
    for ; ith >= 0; ith-- {
        j = strings.Index(s[i:], old)
        if j < 0 {
            return s // avoid allocation
        }
        j += i
        i = j + len(old)
    }
    t := make([]byte, len(s)+(len(new)-len(old))) // Apply replacements to buffer.
    w := copy(t, s[0:j])
    w += copy(t[w:], new)
    w += copy(t[w:], s[j+len(old):])
    return string(t[0:w])
}

3- If you want to remove all instances of Unicode character (old string) from any position in the string, you may use:

strings.Replace(str, old, "", -1)

4- Also this works fine for removing from bytes.buffer:

strings.Replace(buf.String(), old, newR, -1)

Like so:

buf = bytes.NewBuffer([]byte(strings.Replace(buf.String(), old, newR, -1)))

Here is the complete working code (try it on The Go Playground):

package main

import (
    "bytes"
    "fmt"
    "strings"
)

func main() {
    str := `کیا حال ھے؟` //How are you?
    old := `ک`
    newR := ""
    fmt.Println(strings.Replace(str, old, newR, -1))

    buf := bytes.NewBuffer([]byte(str))
    //  for _, r := range str {
    //      buf.WriteRune(r)
    //  }
    fmt.Println(buf.Bytes())

    bs := []byte(strings.Replace(buf.String(), old, newR, -1))
    buf = bytes.NewBuffer(bs)

    fmt.Println("       ", buf.Bytes())
}

output:

یا حال ھے؟
[218 169 219 140 216 167 32 216 173 216 167 217 132 32 218 190 219 146 216 159]
        [219 140 216 167 32 216 173 216 167 217 132 32 218 190 219 146 216 159]

5- strings.Replace is very efficient, see inside:

// Replace returns a copy of the string s with the first n
// non-overlapping instances of old replaced by new.
// If old is empty, it matches at the beginning of the string
// and after each UTF-8 sequence, yielding up to k+1 replacements
// for a k-rune string.
// If n < 0, there is no limit on the number of replacements.
func Replace(s, old, new string, n int) string {
  if old == new || n == 0 {
      return s // avoid allocation
  }

  // Compute number of replacements.
  if m := Count(s, old); m == 0 {
      return s // avoid allocation
  } else if n < 0 || m < n {
      n = m
  }

  // Apply replacements to buffer.
  t := make([]byte, len(s)+n*(len(new)-len(old)))
  w := 0
  start := 0
  for i := 0; i < n; i++ {
      j := start
      if len(old) == 0 {
          if i > 0 {
              _, wid := utf8.DecodeRuneInString(s[start:])
              j += wid
          }
      } else {
          j += Index(s[start:], old)
      }
      w += copy(t[w:], s[start:j])
      w += copy(t[w:], new)
      start = j + len(old)
  }
  w += copy(t[w:], s[start:])
  return string(t[0:w])
}

6 Comments

Thanks. But I think this will replace all instances of a character and not the character in the ith position in the buffer.
Thanks again, but I think you misunderstood: I want to remove whatever character that is at the ith position in the buffer. Not the ith instance of some character. Essentially, a solution should only need the index in the buffer to remove, nothing else. So I think using replace is inherently unsuitable here because replace asks for what character to substitute, not what position to remove. So if I have a buffer b containing hello, the function removeAt(b,1) should modify b so it contains hllo.
@hazrmard : remove at byte position or rune position?
Remove the ith rune in the buffer.
@See new Edit at the top , I hope this helps.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.