xmlRoot is not returning a valid element when parsing an html document using XML package in R

Question

I'm wanting to parse the html to pull out specific pieces using xpathSApply but the xmlRoot call returns an element whose name is the text of the entire document:

> url <- "http://www.achaea.com/game/who"
> doc <- htmlParse(url)
> top <- xmlRoot(doc)
> xmlName(top)

Which displays the entire HTML document for the 'name' rather than a root element name. Can someone tell me what is the cause of this behavior? I want to be able to pull out the individual names in the 'honors' hrefs.

Thanks Randy, you are right, I exited RStudio and started it back up and got your results. This looks like I had some sort of interaction with previously executed code. I should have used rm on doc to have a clean slate. This resolves my problem. — gregbowman
– gregbowman, Commented Apr 13, 2014 at 1:52
I thought I misunderstood your question and removed the comment...anyway, see if my answer produces what you want. — Randy Lai
– Randy Lai, Commented Apr 13, 2014 at 1:54
Please consider removing the question since it was not a problem. — hrbrmstr
– hrbrmstr, Commented Apr 13, 2014 at 1:55

Randy Lai · Accepted Answer · 2014-04-13 01:52:05Z

1

Try

xpathSApply(top, "//div[@id='content']//a", xmlValue)

If you want to links

xpathSApply(top, "//div[@id='content']//a", xmlAttrs, "href")

answered Apr 13, 2014 at 1:52

Randy Lai

3,1942 gold badges24 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

xmlRoot is not returning a valid element when parsing an html document using XML package in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related