0

I'm writing a web scraper using urllib2 and BeautifulSoup in python and am looking for a way to instruct python to click a button on a page that it reads the HTML source code for.

The following snippet of my script reads in URLs from a csv file and is meant to scrape data from the webpages specified, but an intermediary step is to click a "submit" button that exists on the webpage that is read from the csv's provided URLs.

for line in triplines:
    FromTo = line.split(",")
    From = FromTo[0].strip()
    print(From)
    To = FromTo[1].strip()
    print(To)
    url = KCString1 + From + KCString2 + To + KCString3
    print(url)
    page = urllib2.urlopen(url)
    page_source = page.read()
    soup = BeautifulSoup(page_source)
    print(soup.prettify())

Is there a way to utilize urllib2 functionality in such a way as to say "follow the URL that is obtained from clicking this button"? I imagine I may need to find the JavaScript source to identify the button's identifiers first.

2
  • Why not using Scrapy (scrapy.org)? Commented Jul 2, 2014 at 19:16
  • Not sure if you want to use urllib2 for this. Have you looked at Selenium? Commented Jul 2, 2014 at 19:17

1 Answer 1

3

Buttons do not typically have urls attached to them. They normally need javascript interaction, which needs emulation. If you want to click a button, you should use a browser emulator like Ghost instead of a parser like Beautifulsoup

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.