I'm working with XML files from clinicaltrials.gov, which have a structure like this:
<clinical_study>
...
<brief_title>
...
<location>
<facility>
<name>
<address>
<city>
<state>
<zip>
<country>
</facility>
</location>
<location>
...
</location>
...
</clinical_study>
I'm gathering information from multiple XML files, so the number of locations in each file is unknown and could even be zero. I need to extract all the information about each location and save into an SQL table. I've had some success using functions from the XML package to extract information from single nodes, e.g.
library(XML)
nct_url <- "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true"
xml_doc <- xmlParse(nct_url, useInternalNode=TRUE)
title_path <- "/clinical_study/brief_title"
title_text <- xpathSApply(xml_doc, title_path, xmlValue)
I'm experimenting with getNodeSet, and this gives me a set of the right length:
doc <- xmlParse("NCT00007501.xml")
locations <- getNodeSet(doc, "/clinical_study/location")
length(locations)
[1] 22
> class(locations)
[1] "XMLNodeSet"
but my attempts to extract information from this set have been mostly fruitless. Any suggestions?