scrapy scrape html source code

Question

I'm using scrapy to crawl and scrape a website. I need the whole html instead of components. We can easily extract the component using xpath selectors but is there any method to extract the whole html block for a given class. For example in the below html code, i need the exact html source code of the whole div block prod-basic-info. Is there anyway i can do this ?

<div class="block prod-basic-info">
 <h2>Product information</h2>
 <p class="product-info-label">Category</p>
  <p>
   <a href="xyz.html"</a>
 </p>
</div>

alecxe · Accepted Answer · 2015-02-09 05:35:09Z

4

Just point your xpath expression or CSS selector to the element and extract() it:

response.xpath('//div[contains(@class, "prod-basic-info")]').extract()[0]
response.css('div.prod-basic-info').extract()[0]

answered Feb 9, 2015 at 5:35

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

scrapy scrape html source code

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related