I have an XML dataset that looks like this:
<protocol ID='.'>
<HEAD></HEAD>
<block ID='...'>
<HEAD></HEAD>
<trial ID='.....'>
<HEAD></HEAD>
<seq ID=''>
<HEAD></HEAD>
<calibration CLASS='affine-calibration' ID='New Calibration'>
<AX>.........</AX>
<BX>-........</BX>
<AY>.........</AY>
<BY>.........</BY>
<type>'por'</type>
</calibration>
<POR TIME='......'>
<PUPIL>.</PUPIL>
<BLINK>.</BLINK>
<V>...</V>
<H>...</H>
<PLANEINTRWV>...</PLANEINTRWV>
<PLANEINTRWH>...</PLANEINTRWH>
<PLANE>.</PLANE>
</POR>
<POR TIME='......'>
<PUPIL>.</PUPIL>
<BLINK>.</BLINK>
<V>...</V>
<H>...</H>
<PLANEINTRWV>...</PLANEINTRWV>
<PLANEINTRWH>...</PLANEINTRWH>
<PLANE>.</PLANE>
</POR>
<POR TIME='......'>
<PUPIL>.</PUPIL>
<BLINK>.</BLINK>
<V>...</V>
<H>...</H>
<PLANEINTRWV>...</PLANEINTRWV>
<PLANEINTRWH>...</PLANEINTRWH>
<PLANE>.</PLANE>
</POR>
</seq>
</trial>
<trial ID='.....'>
<HEAD></HEAD>
<seq ID=''>
<HEAD></HEAD>
<calibration CLASS='affine-calibration' ID='New Calibration'>
<AX>.........</AX>
<BX>-........</BX>
<AY>.........</AY>
<BY>.........</BY>
<type>'por'</type>
</calibration>
<POR TIME='......'>
<PUPIL>.</PUPIL>
<BLINK>.</BLINK>
<V>...</V>
<H>...</H>
<PLANEINTRWV>...</PLANEINTRWV>
<PLANEINTRWH>...</PLANEINTRWH>
<PLANE>.</PLANE>
</POR>
<POR TIME='......'>
<PUPIL>.</PUPIL>
<BLINK>.</BLINK>
<V>...</V>
<H>...</H>
<PLANEINTRWV>...</PLANEINTRWV>
<PLANEINTRWH>...</PLANEINTRWH>
<PLANE>.</PLANE>
</POR>
</seq>
</trial>
</block>
</protocol>
Using the XML package, what is the cleanest way to extract the POR tag's children and the tag's attributes?
I threw together this kludge that works, but it's slow (due to the xpathSApply call most likely) and is hardly readable.
trackToDataFrame = function(file) {
doc2=xmlParse(file)
timeStamps = t(xpathSApply(doc2, '//*[@TIME]', function(x) c(name=xmlName(x), xmlAttrs(x))))
dd2 = xmlToDataFrame(getNodeSet(doc2, "//POR"), colClasses=c(rep("integer", 7)))
dd2 = cbind(dd2, timeStamps)
dd2
}
Calling on the dataset returns:
PUPIL BLINK V H PLANEINTRWV PLANEINTRWH PLANE name TIME
1 NA NA NA NA NA NA NA POR ......
2 NA NA NA NA NA NA NA POR ......
3 NA NA NA NA NA NA NA POR ......
4 NA NA NA NA NA NA NA POR ......
5 NA NA NA NA NA NA NA POR ......
I'm figuring that the whole thing can be done with a single xmlToDataFrame call, but I'm not familiar enough with the XML package to get it to work.
What I'm really interested in is the 'TIME' column along with all of the columns extracted form the xmlToDataFrame call.