I need to extract the following a block of text from a set of google results obtained using
require(XML)
require(RCurl)
input<-"R%statistical%Software"
require(XML)
require(RCurl)
url <- paste("https://www.google.com/search?q=\"",
input, "\"", sep = "")
CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
script <- getURL(url, followlocation = TRUE, cainfo = CAINFO)
doc <- htmlParse(script)
in the R package XML
An extract of the extracted HTML document as follows
</ul></div>
</div>
</div>
<span class="st">R, also called GNU S, is a strongly functional language and environment to <br>
statistically explore data sets, make many graphical displays of data from custom<br>
 <b>...</b></span><br>
</div>
<table class="slk" cellpadding="0" cellspacing="0" style="border-collapse:collapse;margin-top:1px">
<tr class="mslg">
<td style="padding-left:23px;vertical-align:top"><div class="sld">
In this example I need to extract the following text for each result returned
"R, also called GNU S, is a strongly functional language and environment to
statistically explore data sets, make many graphical displays of data from custom
"
I have had a go with some of the functions in the XML package for R, but I don't think I understand enough about HTML and XML. The text will vary for each result returned, so its actually the
<span class="st">
?field? I need to extract. As you have probably guessed I am not familiar with HTML or XML. So any recommendations for a good tutorial or book that would give me enough of an overview to solve these kind of problems would be most welcome. Thanks