Is the program correct? Not if the input is imperfect. For example, switch the script start and end tags,
html := `aaaa<scriptxxx</script>bbb`
fmt.Println(html)
fmt.Println(removeScripts(html))
html = `aaaa</script>xxx<script>bbb`
fmt.Println(html)
fmt.Println(removeScripts(html))
Output:
aaaa<scriptxxx</script>bbb
aaaabbb
aaaa</script>xxx<script>bbb
panic: runtime error: slice bounds out of range
The Go programming language was designed to operate at Google scale. Go programs are usually written to be reasonably efficient. For example, assume that your function is used by Google's search web crawler on billions of HTML pages (with multiple scripts per page) per day.
I see opportunities to make your function more efficient by using CPU time and memory only when necessary. To estimate how much your function might be improved, I ran some Go benchmarks.
old.txt (Lansana):
goos: linux
goarch: amd64
pkg: cr/script
BenchmarkLansana-4 2000 604549 ns/op 79872 B/op 16 allocs/op
new.txt (PeterSO):
goos: linux
goarch: amd64
pkg: cr/script
BenchmarkPeterSO-4 100000 11039 ns/op 10240 B/op 2 allocs/op
old.txt (Lansana) versus new.txt (PeterSO):
benchmark old ns/op new ns/op delta
BenchmarkScripts-4 604549 11039 -98.17%
benchmark old allocs new allocs delta
BenchmarkScripts-4 16 2 -87.50%
benchmark old bytes new bytes delta
BenchmarkScripts-4 79872 10240 -87.18%
You should try to make your function more efficient. Don't restart at the beginning, don't make unnecessary string and other allocations, don't copy unnecessarily, and so on. Benchmark critical functions and methods.
script_test.go:
package main
import (
"strings"
"testing"
)
// benchmark
var (
scriptElement = `<script type="text/javascript">` + strings.Repeat(` JavaScript `, 8) + `</script>`
htmlElement = ` ` + scriptElement + strings.Repeat(`X`, 1024) + scriptElement + ` `
html = strings.Repeat(htmlElement, 4)
)
// removeScripts removes all HTML script elements.
func removeScriptsPeterSO(s string) string {
const (
startTag = "<script"
endTag = "</script>"
)
start := strings.Index(s, startTag)
if start < 0 {
return s
}
b := make([]byte, start, len(s))
copy(b, s)
for {
end := strings.Index(s[start+len(startTag):], endTag)
if end < 0 {
b = append(b, s[start:]...)
break
}
end += (start + len(startTag)) + len(endTag)
start = strings.Index(s[end:], startTag)
if start < 0 {
b = append(b, s[end:]...)
break
}
start += end
b = append(b, s[end:start]...)
}
return string(b)
}
func BenchmarkPeterSO(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
removeScriptsPeterSO(html)
}
}
func removeScriptsLansana(s string) string {
startingScriptTag := "<script"
endingScriptTag := "</script>"
var script string
for {
startingScriptTagIndex := strings.Index(s, startingScriptTag)
endingScriptTagIndex := strings.Index(s, endingScriptTag)
if startingScriptTagIndex > -1 && endingScriptTagIndex > -1 {
script = s[startingScriptTagIndex : endingScriptTagIndex+len(endingScriptTag)]
s = strings.Replace(s, script, "", 1)
continue
}
break
}
return s
}
func BenchmarkLansana(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
removeScriptsLansana(html)
}
}
For a program to be correct, maintainable, and reasonably efficient it must be readable. This is not readable:
func removeScripts(s string) string {
script = s[startingScriptTagIndex : endingScriptTagIndex+len(endingScriptTag)]
}
This is more readable:
func removeScripts(s string) string {
script = s[start : end+len(endTag)]
}
< scriptand probably a number of other variations that will work in a browser (peterSO's answer has the same problem). Also wouldn't be surprised if there are ways to get browsers to run code even when there's no end</script>. If security is your aim, I recommend goquery or, even better, bluemonday. \$\endgroup\$