2

I am trying to extract certain xml values out of a (pretty large) document. Because I am only interested in some nodes, I created subsets.

library(XML)
data.raw <- xmlParse(file="in/data.xml", encoding="UTF-8")
data.top <- xmlRoot(data.raw)
subset.wkr67 <-  getNodeSet(doc=data.top, "//wahl[@jahr='13']/gebiet[@schluessel='67']/wvt")

The last object looks like this (fyi, these are election results with absolute vote counts for certain districts):

[[1]]
<wvt kurz="CDU" lang="Christlich Demokratische Union Deutschlands in Niedersachsen" button="CDU">
    <ergebnis kurz="STWVT" lang="Zweitstimmen">
        <stimmen>21478</stimmen>
        <farbe>#0033CC</farbe>
        <prozent>57.6</prozent>
    </ergebnis>
    <ergebnis kurz="STKAND" lang="Erststimmen">
        <stimmen>25835</stimmen>
        <farbe>#0033CC</farbe>
        <prozent>69.4</prozent>
    </ergebnis>
</wvt>

[[2]]
...   

attr(,"class")
[1] "XMLNodeSet"

I want to extract the absolute vote count in the different tiers; they should be saved in separate objects. As far as I get, this should be possible with xmlValue and sapply.

In order to extract the value of the "stimmen" element that is a sibling of the element "ergebnis" with the attribute "kurz"="STWVT" (in my example: 21478), I was trying to do this:

sapply(subset.wkr67, xmlValue, '/wvt/ergebnis[@kurz="STWVT"]/stimmen') 
[1] "21478#0033CC57.625835#0033CC69.4" "6640#FFDFDF17.86308#FFDFDF17.0"   "4682#99990012.61410#FFFF993.8"    "2663#CCFFCC7.11888#CCFFCC5.1"    
[5] "708#C979E31.9848#B953EC2.3"       "220.1"                            "3731.0"                           "830.2"                           
[9] "2140.6"                           "1520.4"                           "1220.3"                           "542#F5A5541.5541#F5A5541.5"      
[13] "593#ECF0EC1.6373#ECF0EC1.0" 

I somehow extract far too many information. (Each element is basically the values of ALL elements pasted together. The length of 13 is okay and fits the data.) (If I further add the option "recursive=FALSE" to the R command, my results are a vector of the same length that contains only characters.)

How can I extract only the first value of the "stimmen" element? (21478 in my case) Thanks for your help!

1 Answer 1

3

Assuming you only have the shown data in the xml file (with header), try this:

library(XML)
doc = xmlParseDoc("wahl.xml")
xpathSApply(doc,"/wvt/ergebnis",xmlAttrs) 
xpathSApply(doc,"/wvt/ergebnis/stimmen",xmlValue)

Some conversion to data frame should follow to get descriptors for each vote set.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.