Python web crawler [closed]

Question

Is there any python crawler which pulls out all the data from a webpage for ex: http://www.bestbuy.com/site/HTC+-+One+S+4G+Mobile+Phone+-+Gradient+Blue+%28T-Mobile%29/4980512.p?id=1218587135819&skuId=4980512&contract_desc= In this page the customer review has two pages 1 and 2.I want to crawl t his url and get the content of both the pages. Is this possible with a python crawler.

Also does python crawler supports all the modern GET/POST technologies

Instead you could see if Best Buy has a API that would work for you. — kyle k
– kyle k, Commented May 6, 2014 at 21:40

user647772 · Accepted Answer · 2012-07-26 13:32:00Z

12

You could use Scrapy:

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

answered Jul 26, 2012 at 13:32

user647772

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:15:25Z

3

If you want to crawl a site, see this post. If you only want to process some pages and analyze their content (meaning you know the URLs you want to process), try BeautifulSoup, it allows you to do things like:

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for f in soup.findAll('form'):
    target_url = f['action']
    #do something with each one of the forms

edited May 23, 2017 at 12:15

CommunityBot

11 silver badge

answered Jul 26, 2012 at 14:47

gutes

1525 bronze badges

Collectives™ on Stack Overflow

Python web crawler [closed]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related