0

I am trying to scrape text values from a website. I have been able to parse the url. I am new to XPath in R. So I am not sure how to pull out all the text values that has tag as

'<p class="MsoNormal" align="justify"> text </p>.'

How do I specify the path to the the specific tag and get the text value. This is what I am trying right now.

pizzaraw<-xpathSApply(pizzadoc, "//p[@class='MsoNormal']", xmlValue)

Is this the right approach. R seems not responding to the code.

2
  • 1
    Quick summary of XPath: //p will give you all p elements (ignoring nesting). //p[1] will return the first p. //p[1]/text() will return the text contents. //p[1]/@class will return the contents of the class attribute, and so on. Commented Apr 17, 2014 at 20:43
  • 1
    It might be helpful to look at the selectr package also. This allows you to use css selectors rather then xpaths in tandem with the XML package. It also allows you to easily handle namespaces which maybe the problem you are having here. Commented Apr 17, 2014 at 20:59

1 Answer 1

1

Its difficult to know what is wrong given that your example is not self-contained but here is a self-contained one that works:

Lines <- '<html>
<p class="MsoNormal" align="justify"> text </p>
</html>
'

library(XML)
root <- htmlTreeParse(Lines, asText = TRUE, useInternalNodes = TRUE)
doc <- xmlRoot(root)
xpathSApply(doc, '//p[@class="MsoNormal"]', xmlValue, trim = TRUE)
## [1] "text"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.