1

I have one input file which has html tag embedded in xml for example

 <Root>
   <Section1>
   <p>some text</p>
   <br>
   <table>
       <th></th>
       <tr>
       <td></td> 
       </tr>    
   </table>
   </Section1>
  <Section2>
  <ol>
      <li>1</li>
      <li>2</li>
      <li>3</li>
  </ol>
  </Section2>
</Root>

Is there any way to parse html embedded in xml document in R?

2
  • Actually, your question is a bit vague. This isn't an xhtml document. It looks like a snippet of XML, some of the tags which happen to be HTML tags. And with errors (<br> with no </br> or <br/>). So, what is it precisely you want to do?> Commented Jan 11, 2013 at 8:57
  • I have just modified my question as per suggestions. Commented Jan 11, 2013 at 9:10

1 Answer 1

2

If its XHTML then it should be XML, so you use the standard XML parsers. You can find plenty about those elsewhere.

Note your <Section1> tag doesn't close properly. If this is a file you've pasted in, then there's a problem with it.

Sign up to request clarification or add additional context in comments.

4 Comments

For XML, i m using package xml in R. I am successful in parsing it but how to parse html.
what do you mean by parse exactly? if you say you are sucessful in parsing it, you should be be able to pick out the html tags using your parsed object... what EXACTLY are you trying to do?
I need to use parse the html data and reformat it into latex format.
So you want to turn <table> tags into LaTeX tabular environments, <li> into LaTeX itemize etc? In a completely general way? Give up. If your HTML is a tightly controlled subset of all HTML then it might be, but you will need to tell us exactly what that subset is. And you need to do some research.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.