I have pieces of HTML that I need to convert to values in a dataframe.
For example this piece of html:
<div class="header">
<h3>title 1</h3>
</div>
<div class="content">
<ul>
<li>info1</li>
<li>info2
</li>
<li>info3
</li>
</ul>
</div>
<div class="header">
<h2>title 2</h2>
</div>
<div class="content">
<ul>
<li>info4</li>
<li>info5
</li>
<li>info6
</li>
</ul>
</div>
I want it to be changed into a dataframe like:
Title Info
1 title 1 info1
2 title 1 info2
3 title 1 info3
4 title 2 info4
5 title 2 info5
6 title 2 info6
I tried functions in the XML package and the tm.plugin.webmining package. Also I tried the code mentioned on this page:http://tonybreyal.wordpress.com/2011/11/18/htmltotext-extracting-text-from-html-via-xpath/ Until now i haven't succeeded to find a function that does what I want. Does anyone have an idea about how to deal with this problem?