I have a python scraper that can currently only search one website at a time.
I have a list of 6-700 websites a day that are all identical in an excel list. I'm trying to find a way to change from a single website - to multiple websites held in a single column within a .xlsm file
I have previously written code to manually open 50 tabs at a time (see example1) but would like to incorporate that code or a version of, into my scraper if possible.
(Example1)
import webbrowser
import xlrd
file_location = "C:\Python27\REAScraper\ScrapeFile.xlsm"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_name("Sheet1")
url_column = 3
for row in range(1, 1000):
if row % 1 == 0:
raw_input("Paused. Press Enter to continue")
url = sheet.cell_value(row, url_column)
webbrowser.open_new_tab(url)
Below is the py scraper
import urllib2
from bs4 import BeautifulSoup
import csv
import lxml
import xlrd
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'lxml')
titleTag = soup.html.head.title
titleTag = titleTag.text.strip()
p_class = soup.find('p')
p_class = p_class.text.strip()
d_class = soup.find('div', class_="property-value__price")
d_class = d_class.string.strip()
e_class = soup.find('p', class_="property-value__agent")
e_class = e_class.string.strip()
print titleTag, p_class, d_class, e_class
with open('index2.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow([titleTag, p_class, d_class, e_class])
As stated, I can get the single website to work, but not in a range or from an excel sheet. I've tried looking at automate the hard stuff, p.t.h.w, 100's of reddit & google searches....just looking for some assistance if possible.
Cheers :)