I am trying to extract certain xml values out of a (pretty large) document. Because I am only interested in some nodes, I created subsets.
library(XML)
data.raw <- xmlParse(file="in/data.xml", encoding="UTF-8")
data.top <- xmlRoot(data.raw)
subset.wkr67 <- getNodeSet(doc=data.top, "//wahl[@jahr='13']/gebiet[@schluessel='67']/wvt")
The last object looks like this (fyi, these are election results with absolute vote counts for certain districts):
[[1]]
<wvt kurz="CDU" lang="Christlich Demokratische Union Deutschlands in Niedersachsen" button="CDU">
<ergebnis kurz="STWVT" lang="Zweitstimmen">
<stimmen>21478</stimmen>
<farbe>#0033CC</farbe>
<prozent>57.6</prozent>
</ergebnis>
<ergebnis kurz="STKAND" lang="Erststimmen">
<stimmen>25835</stimmen>
<farbe>#0033CC</farbe>
<prozent>69.4</prozent>
</ergebnis>
</wvt>
[[2]]
...
attr(,"class")
[1] "XMLNodeSet"
I want to extract the absolute vote count in the different tiers; they should be saved in separate objects. As far as I get, this should be possible with xmlValue and sapply.
In order to extract the value of the "stimmen" element that is a sibling of the element "ergebnis" with the attribute "kurz"="STWVT" (in my example: 21478), I was trying to do this:
sapply(subset.wkr67, xmlValue, '/wvt/ergebnis[@kurz="STWVT"]/stimmen')
[1] "21478#0033CC57.625835#0033CC69.4" "6640#FFDFDF17.86308#FFDFDF17.0" "4682#99990012.61410#FFFF993.8" "2663#CCFFCC7.11888#CCFFCC5.1"
[5] "708#C979E31.9848#B953EC2.3" "220.1" "3731.0" "830.2"
[9] "2140.6" "1520.4" "1220.3" "542#F5A5541.5541#F5A5541.5"
[13] "593#ECF0EC1.6373#ECF0EC1.0"
I somehow extract far too many information. (Each element is basically the values of ALL elements pasted together. The length of 13 is okay and fits the data.) (If I further add the option "recursive=FALSE" to the R command, my results are a vector of the same length that contains only characters.)
How can I extract only the first value of the "stimmen" element? (21478 in my case) Thanks for your help!