0

I want to parse a XML file with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<TopLevel FileFormat = "Config">
    <ObjectList ObjectType = "Type1">
        <o><a>value111</a><b>value121</b><c>value131<c/></o>
        <o><a>value112</a><b>value122</b><c>value132<c/></o>
        ...
    </ObjectList>
    <ObjectList ObjectType = "Type2">
        <o><a>value21</a><b>value22</b><c>value23<c/></o>
        ...
    </ObjectList>
    <ObjectList ObjectType = "Type3">
        <o><a>value31</a><b>value32</b><c>value33<c/></o>
        ...
    </ObjectList>
    ...
    <ObjectList ObjectType = "TypeN">
        <o><a>valueN1</a><b>valueN2</b><c>valueN3<c/></o>
        ...
    </ObjectList>
</TopLevel>

I need only data from one node, e.g. 'ObjectList ObjectType = "Type3"'. It may not be the node in the 3rd position. I have to select it based on its name. Finally, the children of this node (a, b, c) should be stored in a data frame.

  • How can I retrieve this node?
  • How can I extract the child data into a data frame?

Any ideas? Thanks in advance!

2 Answers 2

2

Use the XML package to parse the XML:

library(XML)
### load the XML
d <- xmlTreeParse("test.xml")
top <- xmlRoot(d)

use XPath to query what you need, look for all ObjectList nodes with ObjectType='Type3' attribute:

n <- getNodeSet(top, "//ObjectList[@ObjectType='Type3']")

[[1]]
<ObjectList ObjectType="Type3">
 <o>
  <a>value31</a>
  <b>value32</b>
  <c>value33</c>
 </o>
</ObjectList>

convert the structure inside the object into a matrix

m <- lapply(n, function(o)
       t(sapply(xmlChildren(o),
         function(x) xmlSApply(x, xmlValue))))

> m
[[1]]
  a         b         c        
o "value31" "value32" "value33"

You can combine all of them (i.e. if you have multiple matching ObjectList objects) into a data frame:

d <- as.data.frame(do.call("rbind", m))

> d
        a       b       c
o value31 value32 value33
Sign up to request clarification or add additional context in comments.

1 Comment

This answer helped me to better understand XML parsing. Thank you!
0

Try xmlToDataFrame

doc <- xmlParse("test.xml")
xmlToDataFrame(doc["//ObjectList[@ObjectType='Type3']/o"])
        a       b       c
1 value31 value32 value33

1 Comment

This is a really elegant way! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.