Replace whole word without using regex

Question

I am using string.Replace to replace substring A

func removeIP(text string) string {
    text = strings.Replace(text, "someWord", "**NewWord**", -1)
    return text
}

func removeIPUsingRegex(text string) string {
    var re = regexp.MustCompile(`\b` + "someWord" + `\b`) // I want to replace whole word only
    text = re.ReplaceAllString(text, "**NewWord**")
}

The Issue I am facing here is, I want to replace a whole word only if is not supported by string replace.
As I have to replace for very very huge strings may be in GBs. Regex is very slow compare to string replace.
eg: text: "abcdef defgh /def/ .def/ =def= def xxxy" -> Replace def with DEF
output: "abcdef defgh /DEF/ .DEF/ =DEF= DEF xxxy" //Notice only whole words have been replaced.

Regex bumps the time by almost 100 times (https://medium.com/codezillas/golang-replace-vs-regexp-de4e48482f53). Any Ideas will be much appreciated.

The reason regex takes longer is that it's more complex, which is also why it actually works. Simple string replace is too simplistic to handle what you're drying to do. You could write you own replace function, iterating the string character by character, which would likely have performance somewhere between strings.Replace and regexp.ReplaceAllString, but it would also require more effort in development and testing. — Adrian
– Adrian, Commented Feb 28, 2020 at 17:29
If your text is several GBs big, and you access it using an io.Reader, you could also look at this question about replacing characters while streaming, it however doesn't answer the "whole word" thing — xarantolus
– xarantolus, Commented Feb 28, 2020 at 19:01
You can use very fast string search with Index() to find the places where your substring matches. Then check the character before and after the match. If they are not letters then replace this occurence of a substring. This will be just slightly slower than Replace. — Alex Sveshnikov
– Alex Sveshnikov, Commented Feb 28, 2020 at 19:31
An addition to my previous comment: to find a position of a substring starting at certain index inside the string use Slice() and then Index(): stackoverflow.com/questions/54090467/… — Alex Sveshnikov
– Alex Sveshnikov, Commented Feb 28, 2020 at 19:44
Yea, stream reading is the most likely answer. Go's re2 is already one of the most efficient regex engines but still not good for (really) huge files. — wp78de
– wp78de, Commented Feb 29, 2020 at 0:00

Amit · Accepted Answer · 2020-02-29 06:49:02Z

KMP ALGorithm used
// ReplaceWholeWord ...

 func ReplaceWholeWord(text string, oldWord string, newWord string) string {
        var patternLength = len(oldWord)
        var textLength = len(text) 

        var copyIndex = 0
        var textIndex = 0
        var patternIndex = 0
        var newString strings.Builder
        var lps = computeLPSArray(oldWord)

        for textIndex < textLength {
            if oldWord[patternIndex] == text[textIndex] {
                patternIndex++
                textIndex++
            }
            if patternIndex == patternLength {
                startIndex := textIndex - patternIndex
                endIndex := textIndex - patternIndex + patternLength - 1

                if checkIfWholeWord(text, startIndex, endIndex) {
                    if copyIndex != startIndex {
                        newString.WriteString(text[copyIndex:startIndex])
                    }
                    newString.WriteString(newWord)
                    copyIndex = endIndex + 1
                }

                patternIndex = 0
                textIndex = endIndex + 1

            } else if textIndex < textLength && oldWord[patternIndex] != text[textIndex] {

                if patternIndex != 0 {
                    patternIndex = lps[patternIndex-1]

                } else {
                    textIndex = textIndex + 1
                }

            }
        }
        newString.WriteString(text[copyIndex:])

        return newString.String()
    }

    func computeLPSArray(pattern string) []int {
        var length = 0
        var i = 1
        var patternLength = len(pattern)

        var lps = make([]int, patternLength)

        lps[0] = 0

        for i = 1; i < patternLength; {
            if pattern[i] == pattern[length] {
                length++
                lps[i] = length
                i++

            } else {

                if length != 0 {
                    length = lps[length-1]

                } else {
                    lps[i] = length
                    i++
                }
            }
        }
        return lps
    }

    func checkIfWholeWord(text string, startIndex int, endIndex int) bool {
        startIndex = startIndex - 1
        endIndex = endIndex + 1

        if (startIndex < 0 && endIndex >= len(text)) ||
            (startIndex < 0 && endIndex < len(text) && isNonWord(text[endIndex])) ||
            (startIndex >= 0 && endIndex >= len(text) && isNonWord(text[startIndex])) ||
            (startIndex >= 0 && endIndex < len(text) && isNonWord(text[startIndex]) && isNonWord(text[endIndex])) {
            return true
        }

        return false
    }

    func isNonWord(c byte) bool {
        return !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c == '_'))
    }

This is same as general string replace algorithm. I have used extara spaces but space is not an issue for me

Collectives™ on Stack Overflow

Replace whole word without using regex

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related