I am working with a xml file with the following structure below which I am trying to display each unique into a dataframe. I know I can retrieve each child attribute (e.g., ) using the xpathApply function but notice that the //channel//item//category[@domain='tag'] contains different counts. How would I be able to put these categories all in one cell separated by a comma? Would you loop over each child attribute ?
Here's a test.xml
test.xml <- "<channel>
<item>
<title>Article Name 1</title>
<creator>User1</creator>
<post_id>1000</post_id>
<category domain='tag' nicename='red'>Red</category>
<category domain='store' nicename='clothes'>Clothes</category>
</item>
<item>
<title>Article Name 2</title>
<creator>User3</creator>
<post_id>232</post_id>
<category domain='tag' nicename='blue'>Blue</category>
<category domain='tag' nicename='green'>Green</category>
<category domain='tag' nicename='yellow'>Yellow</category>
<category domain='store' nicename='clothes'>Other</category>
</item>
<item>
<title>Article Name 3</title>
<creator>User4</creator>
<post_id>4532</post_id>
<category domain='tag' nicename='red'>Red</category>
<category domain='tag' nicename='blue'>Blue</category>
<category domain='store' nicename='clothes'>Food</category>
</item>
</channel>"
xml <- xmlParse(test.xml)
The end goal should look like this:
| title | creator | tag | store |
|---|---|---|---|
| Article 1 | User 1 | Red | Clothes |
| Article 2 | User 3 | Blue, Green | Other |
| Article 3 | User 4 | Red, Blue | Food |