0

I have html document

<value>1,2,3</value>
 <value>,1,3,5</value>

and what to extract text with code below but it only prints 'value' tags (css selectors). How to print the text from between tags instead using golang html package ?

z := html.NewTokenizer(b)
    for {
        tt := z.Next()
        switch {
        case tt == html.ErrorToken:
            return
        case tt == html.StartTagToken:
            t := z.Token()
            isAnchor := t.Data == "value"
            if isAnchor {
                fmt.Println(t.Data)
            }
        }
    }
3
  • 1
    Is the Text() method that you're looking for ? godoc.org/golang.org/x/net/html#Tokenizer.Text Commented Nov 22, 2016 at 13:31
  • Yeah, not sure how to use it here. t.Text ? Commented Nov 22, 2016 at 13:35
  • 3
    I think StartTagToken's Data will always contain the tag's name (in this case "value"). You should advance the tokenizer once more to get the TextToken. It's Data should be the text itself (i.e "1,2,3"). Commented Nov 22, 2016 at 13:42

2 Answers 2

4

This seems to work for me:

r := strings.NewReader("<value>1,2,3</value><value>,1,3,5</value>")
doc, err := html.Parse(r)
if err != nil {
    log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
    if n.Type == html.ElementNode && n.Data == "value" {
        fmt.Println(n.FirstChild.Data)
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        f(c)
    }
}
f(doc)

I think the key is grabbing the FirstChild after finding the "value" node.

Sign up to request clarification or add additional context in comments.

Comments

1

You have to use Text() method on the next Token.

if isAnchor := t.Data == "value"; isAnchor {
    z.Next()
    fmt.Println(z.Text())
}

6 Comments

z.Next().Text undefined (type html.TokenType has no field or method Text)
Edited. Sorry for the mistake.
now: z.Next().Token undefined (type html.TokenType has no field or method Token)
Edited again. Hope I don't have to do it again ;)
for some reason it prints me empty brackets [] as Text. I think I had it before. I tried to change to <value>1</value> with single character but makes no difference
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.