I'm following a python tutorial on youtube and got up to where we make a basic web crawler. I tried making my own to do a very simple task. Go to my cities car section on craigslist and print the title/link of every entry, and jump to the next page and repeat if needed. It works for the first page, but won't continue to change pages and get the data. Can someone help explain what's wrong?
import requests
from bs4 import BeautifulSoup
def widow(max_pages):
page = 0 # craigslist starts at page 0
while page <= max_pages:
url = 'http://orlando.craigslist.org/search/cto?s=' + str(page) # craigslist search url + current page number
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'lxml') # my computer yelled at me if 'lxml' wasn't included. your mileage may vary
for link in soup.findAll('a', {'class':'hdrlnk'}):
href = 'http://orlando.craigslist.org' + link.get('href') # href = /cto/'number'.html
title = link.string
print(title)
print(href)
page += 100 # craigslist pages go 0, 100, 200, etc
widow(0) # 0 gets the first page, replace with multiples of 100 for extra pages