-4

Is there any python crawler which pulls out all the data from a webpage for ex: http://www.bestbuy.com/site/HTC+-+One+S+4G+Mobile+Phone+-+Gradient+Blue+%28T-Mobile%29/4980512.p?id=1218587135819&skuId=4980512&contract_desc= In this page the customer review has two pages 1 and 2.I want to crawl t his url and get the content of both the pages. Is this possible with a python crawler.

Also does python crawler supports all the modern GET/POST technologies

1
  • 1
    Instead you could see if Best Buy has a API that would work for you. Commented May 6, 2014 at 21:40

2 Answers 2

12

You could use Scrapy:

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Sign up to request clarification or add additional context in comments.

Comments

3

If you want to crawl a site, see this post. If you only want to process some pages and analyze their content (meaning you know the URLs you want to process), try BeautifulSoup, it allows you to do things like:

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for f in soup.findAll('form'):
    target_url = f['action']
    #do something with each one of the forms

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.