I have an XML like:
<Trait ID="4711" Type="Disease">
<!-- each phenotype -->
<Name>
<ElementValue Type="Preferred">Breast-ovarian cancer, familial 1</ElementValue>
<XRef ID="Breast-ovarian+cancer%2C+familial+1/7865" DB="Genetic Alliance"/>
</Name>
<Name>
<ElementValue Type="Alternate">BREAST-OVARIAN CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1</ElementValue>
<XRef Type="MIM" ID="604370" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0001" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0002" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0003" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0004" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0005" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0006" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0007" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0008" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0009" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0010" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0011" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0012" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0013" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0014" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0015" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0016" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0017" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0018" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0019" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0020" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0021" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0022" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0023" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0024" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0025" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0026" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0027" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0028" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0029" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0030" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0031" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0032" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0033" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0034" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0035" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0036" DB="OMIM"/>
<XRef Type="Allelic variant" ID="113705.0037" DB="OMIM"/>
</Name>
<Name>
<ElementValue Type="Alternate">OVARIAN CANCER, SUSCEPTIBILITY TO</ElementValue>
<XRef Type="Allelic variant" ID="602667.0001" DB="OMIM"/>
</Name>
<Name>
<ElementValue Type="Alternate">BREAST CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1</ElementValue>
<XRef ID="604370" DB="OMIM"/>
</Name>
<Name>
<ElementValue Type="Alternate">Breast cancer, familial 1</ElementValue>
</Name>
<Name>
<ElementValue Type="Alternate">Breast-ovarian cancer, familial 1 and 2</ElementValue>
<XRef ID="GTR000310494" DB="Laboratory of Genetics,HUSLAB"/>
</Name>
<Name>
<ElementValue Type="Alternate">BRCA1 Gene Mutation</ElementValue>
<XRef ID="GTR000501743" DB="Myriad Genetic Laboratories,Myriad Genetic Laboratories, Inc."/>
</Name>
</Trait>
And I want to parse the XML to a data frame like:
Im tring to use the r package XML, but the problem that i have more XREF attribute values than names of trait values. I can solve this with a for loop but that is often not the "R way". I was wondering if, there are is a simpler solution? (e.g. with xpath query).
I'm trying something like this:
x <- do.call(rbind, xpathApply(xml_1, "//TraitSet/Trait[@ID='4711']/Name", function(node) {
trait <- xmlValue(node[["ElementValue"]])
xp <- "//TraitSet/Trait[@ID='4711']/Name/XRef"
DB <- sapply(c("ID","DB"), function(x) xpathSApply(xmltop, '//TraitSet/Trait/Name/XRef', xmlGetAttr, x))
if (is.null(DB)) DB <- NA
data.frame(trait, DB, stringsAsFactors = FALSE)
}))
but the records are incorrectly multiplied.
I will appreciate help! Thanks