1

I just want to extract div part html part from the following html with path xpath. I just give some part of this link html:

    <html>
    <head>
    <meta charset="utf-8">
    <title>Items 1 to 20 -- Example Page 1</title>
    <script type="text/javascript">
     var _gaq = _gaq || [];
    _ gaq.push(['_setAccount', 'UA-23648880-1']);
   _gaq.push(['_trackPageview']);
   _gaq.push(['_setDomainName', 'econpy.org']);
   </script>
   </head>
   <body>
  <div align="center">1, <a    
       href="http://econpy.pythonanywhere.com/ex/002.html">[<font    
   color="green">2</font>]</a>, <a    
   href="http://econpy.pythonanywhere.com/ex/003.html">[<font 
   color="green">3</font>]</a>, <a   
   href="http://econpy.pythonanywhere.com/ex/004.html">[<font 
   color="green">4</font>]</a>, <a 
  href="http://econpy.pythonanywhere.com/ex/005.html">[<font 
 color="green">5</font>]</a></div>
#I just want to get this part html.
<div class ="item-body" 
 <div title="item1">
 <div title="buyer-name">Carson Busses</div>
 <span class="item-price">$29.95</span><br>
</div>
</div  
.......
<div title="buyer-info">
<div title="buyer-name">Earl E. Byrd</div>
<span class="item-price">$8.37</span><br>
</div>
<div title="buyer-info">
<div title="buyer-name">Patty Cakes</div>
<span class="item-price">$15.26</span><br>
</div>
<div title="buyer-info">
<div title="buyer-name">Derri Anne Connecticut</div>
<span class="item-price">$19.25</span><br>
</div>
<div title="buyer-info">
<div title="buyer-name">Moe Dess</div>
<span class="item-price">$19.25</span><br>
</div>
<div title="buyer-info">
<div title="buyer-name">Leda Doggslife</div>
<span class="item-price">$13.99</span><br>
</div>
.........
.........
<div title="buyer-info">
<div title="buyer-name">Rose Tattoo</div>
<span class="item-price">$114.07</span><br>
</div>
<div title="buyer-info">
<div title="buyer-name">Moe Tell</div>
<span class="item-price">$10.09</span><br>
</div>
<script type="text/javascript">  (function() {
var ga = document.createElement('script');     ga.type =     
'text/javascript'; ga.async = true;
ga.src = ('https:'   == document.location.protocol ? 'https://ssl'     
: 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
})();
</script>
</body>
</html>

I just want to extract the following part html in this html link.

 <div class ="item-body" 
 <div title="item1">
  <div title="buyer-name">Carson Busses</div>
 <span class="item-price">$29.95</span><br>
</div>
</div  

my code is:

  from lxml import html
  from lxml import etree
  import requests

  page = requests.get('001.html')
  tree = html.fromstring(page.content)
  buy_info2 = tree.xpath('//div[contains(@title, "item-body")]')
  print("buy-info2: ", buy_info2)

I hope it get the html list, but the result is [ ], Please give you hand to help me and please use the xpath not other method. Thanks!

2
  • May be your solution is here use xpath with BeautifulSoup? try this one. Commented Jun 7, 2016 at 6:46
  • I just want to use the xpath method Commented Jun 7, 2016 at 7:23

2 Answers 2

1

You can pull the div using the class name:

In [2]: from lxml import html

In [3]: xml = html.fromstring(h)

In [4]: div = xml.xpath("//div[@class='item-body']")[0]

In [5]: print(html.tostring(div))
<div class="item-body" title="item1">
 <div title="buyer-name">Carson Busses</div>
 <span class="item-price">$29.95</span><br>
</div>
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer.
0

I have solved this issue, just use the following code, add the //* in my code, it works:

 buy_info2 = tree.xpath('//div[contains(@title, "item-body")]//*')

then I can I get all the html element that I want to. Thanks!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.