0

I have one xml file which has some html content like bold, paragraph and tables. I have written shell script to parse all html tags except tables. I'm using XML (R package) to parse the data.

<Root>
    <Title> This is dummy xml file </Title>
    <Content> This table summarises data in BMC format.
        <div class="abctable">
            <table border="1" cellspacing="0" cellpadding="0" width="100%"   class="coder">
                <tbody>
                    <tr>
                        <th width="50%">ABC</th>
                        <th width="50%">Weight status</th>
                    </tr>
                    <tr>
                        <td>are 18.5</td>
                        <td>arew</td>
                    </tr>
                    <tr>
                        <td>18.5 &amp;mdash; 24.9</td>
                        <td>rweq</td>
                    </tr>
                    <tr>
                        <td>25.0 &amp;mdash; 29.9</td>
                        <td>qewrte</td>
                    </tr>
                    <tr>
                        <td>30.0 and hwerqer</td>
                        <td>rwqe</td>
                    </tr>
                    <tr>
                        <td>40.0 rweq rweq</td>
                        <td>rqwe reqw</td>
                    </tr>
                </tbody>
            </table>
        </div>
    </Content>
    <Section>blah blah blah</Section>
</Root>

How to parse the content of this table which in present in xml?

2 Answers 2

2

Well there is a function called readHTMLTable in the XML package, that seems to do just what you need ?

Here is a way to do it with the following xml file :

<Root>
    <Title> This is dummy xml file </Title>
    <Content>
      This table summarises data in BMC format.

     <div class="abctable">
     <table border="1" cellspacing="0" cellpadding="0" width="100%"   class="coder">
   <tbody>
   <tr>
       <th width="50%">ABC</th><th width="50%">Weight status</th>
   </tr>
   <tr>
       <td>are 18.5</td>
       <td>arew</td>
   </tr>
   <tr>
       <td>18.5 &amp;mdash; 24.9</td>
       <td>rweq</td>
   </tr>
   <tr>
       <td>25.0 &amp;mdash; 29.9</td>
       <td>qewrte</td>
   </tr>
   <tr>
       <td>30.0 and hwerqer</td>
       <td>rwqe</td>
   </tr>
   <tr>
       <td>40.0 rweq rweq</td>
       <td>rqwe reqw</td>
   </tr>
   </tbody>
  </table>
   </Content>
 </div>
 <Section>blah blah blah</Section>
 </Root>

If this is saved in a file called /tmp/data.xml then you can use the following code :

doc <- htmlParse("/tmp/data.xml")
tableNodes <- getNodeSet(doc, "//table")
tb <- readHTMLTable(tableNodes[[1]])

Which fives :

R> tb
                 V1            V2
1               ABC Weight status
2          are 18.5          arew
3 18.5 &mdash; 24.9          rweq
4 25.0 &mdash; 29.9        qewrte
5  30.0 and hwerqer          rwqe
6    40.0 rweq rweq     rqwe reqw
Sign up to request clarification or add additional context in comments.

5 Comments

If look at the command help page and its examples (?readHTMLTable), it seems that you just have to parse your XML, then select one <table> element and use readHTMLTable on it to get the values. All of this is done with functions of the XML package.
I made an attmpt to parse the above xml file (data.xml) : doc = xmlTreeParse("data.xml", useInternal = TRUE, encoding="UTF-8") top = xmlRoot(doc) table<-top[[2]] readHTMLTable[table] but i get error message: Error in readHTMLTable[table] : object of type 'closure' is not subsettable
Updated my answer with a working example (almost copy/pasted from the help page, by the way).
i have uploaded one xml file at textuploader.com/?p=6&id=ZBwog. With this i cannot parse two tables using your code. Can u pls help me where i m wrong.
First, your xml file is not well-formed, most of your tags are converted to entities. Second, please read the help page of readHTMLTable to understand how to parse several tables in a file.
1

The best method for xml parsing would be to use xpath expressions

Xpath Tutorial

Xpath and R

How to use XPath and R stackoverflow

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.