Changing text in html with Golang without affecting html tags

Question

Here is an HTML code:

<h2>Relative URLs</h2>
<p><a href="html_images.asp">HTML Images</a></p>
<p><a href="/css/default.asp">CSS Tutorial</a></p>

How can I replace, change case or do something with text without affecting any html tags using Golang? For example:

<h2>RELATIVE URLS</h2>
<p><a href="html_images.asp">HTML IMAGES</a></p>
<p><a href="/css/default.asp">CSS TUTORIAL</a></p>

I am new in go.... Can you show me please more detailed. Thanks a lot! — Seomat
– Seomat, Commented Oct 8, 2022 at 13:46

score 1 · Accepted Answer · 2022-10-08 15:42:04Z

1

You can try some xpath based parser like htmlquery

s := `<html><head></head><body><h2>Relative URLs</h2>
<p><a href="html_images.asp">HTML Images</a></p></body></html>`

doc, _ := htmlquery.Parse(strings.NewReader(s))
fmt.Printf("Before update \n%s\n", htmlquery.OutputHTML(doc, true))

nodes := htmlquery.Find(doc, "/html/body//*")

for _, node := range nodes {
  if node.FirstChild.DataAtom == 0 { 
    // DataAtom is the atom for Data, or zero if Data is not a known tag name.
    node.FirstChild.Data = strings.ToUpper(node.FirstChild.Data)
  }
}
fmt.Printf("After update \n%s\n", htmlquery.OutputHTML(doc, true))

Output

Before update 
<html><head></head><body><h2>Relative URLs</h2>
<p><a href="html_images.asp">HTML Images</a></p></body></html>
After update 
<html><head></head><body><h2>RELATIVE URLS</h2>
<p><a href="html_images.asp">HTML IMAGES</a></p></body></html>

edited Oct 8, 2022 at 15:42

answered Oct 8, 2022 at 8:29

user19812413

Sign up to request clarification or add additional context in comments.

7 Comments

Seomat Over a year ago

Thank you very much for your example! But as you see Relative URLs is still not in upper case.... How to fix that ALL text will be upper case besides tags?

Seomat Over a year ago

With this code: node := htmlquery.FindOne(doc, "/html/body/*") changes only Relative URLs ...

user19812413 Over a year ago

@Seomat need to use "/html/body//*" for all nodes in the body and htmlQuery.Find that returns list of matched nodes, all in this case. Update can be done based on the DataAtom of the node.

Zach Young Over a year ago

@Seomat, if this answer solved your problem, please accept it (click the ✔️ near the top left of the answer)

user19812413 Over a year ago

check node.FirstChild for nil in the loop

|

Collectives™ on Stack Overflow

Changing text in html with Golang without affecting html tags

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related