1

Given the following XML file:

<XML>
  <A>
    <B>
      <ID>1</ID>
    </B>
    <C>
      <D>10</D>
      <D>20</D>
    </C>
  </A>
  <A>
    <B>
      <ID>2</ID>
    </B>
    <C>
      <D>30</D>
      <D>50</D>
    </C>
  </A>
</XML>

With the following R code I can read in the XML file:

library(XML)
xmlobj <- xmlTreeParse("my_file.xml", useInternalNodes = TRUE)

First, I would like to get a list of the XML nodes "A". I can do this with

node_a <- xpathSApply(doc = xmlobj, path = "//A", xmlChildren)

and the result (node_a) looks like this:

  [,1] [,2]
B ?    ?   
C ?    ?   

In a second step I would like to call a function on each of the XML Nodes in the list extracted in step1 returning a list of XML Nodes "D". I tried to get the children of "C" for the first "A" element in the list from step one:

xmlChildren(asXMLNode(node_a["C",1]))

But the result is:

named list()
attr(,"class")
[1] "XMLNodeList"

Finally, I would like to have the values of D separately for each A (one list of D values for A with ID 1 and one list of D values for A with ID 2).

Or in other words, I want to get a list with the values of all D elements which are part of element A with ID 1 and another list with the values of all D elements which are part of element A with ID 2.

2 Answers 2

1

Calling the xml text at the beginning of your question xmlText,

library(XML)
xml <- xmlParse(xmlText,asText=T)
lapply(xml["//A//C"],function(node)sapply(xmlElementsByTagName(node,"D"),xmlValue))
# [[1]]
#    D    D 
# "10" "20" 
#
# [[2]]
#    D    D 
# "30" "50" 

If you want integers instead of character and you don't want the names,

get.D <- function(node) unname(sapply(xmlElementsByTagName(node,"D"),function(n)as.integer(xmlValue(n))))
lapply(xml["//A//C"],get.D)
# [[1]]
# [1] 10 20
#
# [[2]]
# [1] 30 50
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, that looks very nice! Maybe it's a stupid question but could you also tell me how to get the XML text from the xml file to the variable "xmlText"?
1

I'm not sure of the intermediate steps you want, but to get the values of D,

node_a <- xpathSApply(doc = xmlobj, path = "//D", xmlValue, trim = TRUE)

> node_a
[1] "10" "20" "30" "50"

2 Comments

The problem is that I want the values for D splitted by the A in which they are contained. For example, I only want to have the values of D for the first or the second A element (with ID 1 or with ID 2). So I tried the following: node_a <- xpathSApply(doc = xmlobj, path = "//A", xmlChildren) node_c <- xpathSApply(doc = node_a[[1]], path = "//C", xmlChildren) But it doesn't matter if doc=node_a[[1]] or doc=xmlobj. How can I excecute xpathSApply only on a subnode?
Use getNodeSet to return subnodes, then apply functions like sapply(getNodeSet(xmlobj, "//A"), xpathSApply, ".//D", xmlValue)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.